The fusion analysis program uses several modules that contain transformation- and detection algorithms.
This page will give a brief overview of the algorithms applied to the data in the specified modules. The
information in this page is based on version 1.5.0 of the program.
Detrending module
This module allows for detrending of the data when there is a significant baseline shift. The method used is determined
by the method set in the preferences dialog. The following methods are currently supported:
None (default): no detrending is applied.
Linear: fitting a straight line is subtracted from the data.
Smooth: a moving average of the data with large block size.
Auto: based on the best fit of a linear and exponential.
Exponential: fitting a single exponential function.
Detrending is only recommended if the data has a very poorly behaved baseline: most of the detrending algorithms used
here will also distort the fusion event onset and amplitude to a certain degree. Always try the default method of none
first: the detection module is quite robust against trends in the baseline and will be able to detect a very large
proportion of events. Detrending is applied to both channels if dual color data is available.
When the data has been loaded and detrended, the detection of fusion events can be done using Fusion detection
from the Analysis menu. There is also an option to explore the parameters for automatic detection using the
explore window via Explore detection in the Analysis menu. This will open a new window that shows the
parameters used for automatic detection and allows interactive adjustment.
Explore detection
Before using the automatic detection, it is necessary to verify the settings used for detection: the parameters for
baseline length cannot exceed the total number of frames in the trace. This can be set in the preferences
Edit | Preferences in the menu. Once the settings are approximately correct, the explore detection interface can be
started (see Figure 1).
Figure 1: Explore detection interface, showing three plots (from top to bottom): raw data and a scaled
derivative; first derivative and median filtered derivative with thresholds; and median filtered derivatives for the
rising and falling edge with peak detections and thresholds.
The plots will be updated upon changing the values for each of the fields available on the left. Please note that some parameters
are not immediately visible in the plots. The dotted lines in the bottom plot indicate the different levels of the threshold: the distance
between each dotted line is equal to 1 clean level. This means that the second dotted line is two times the detection threshold, the third line
is three times the detection threshold, etc.
To explore the detection, load data and select the explore detection option from the menu. Once the window opens, a trace can be
selected using the previous or next buttons on the bottom left. Once a trace is loaded, the plots will show the different traces
that are used to find the peaks. The main detection is in the bottom plot, with the derivative data used for detection for the
rising phase (magenta) and the falling phase (blue). The top plot will contain the raw data with the final detected peaks overlaid.
Keep in mind that the peaks in this plot have been filtered for local detection level using the clean-up settings (parameters 2-4).
The parameters to the left show different options that control the detection of events, but some of the settings only apply to the
second channel. The following list shows the influence of the parameters per channel:
Base len: influences the intial baseline measurement and threshold.
Clean {len,lvl,off} [Ch1,Ch2]: influences the cleaning of events if they are below the level of the threshold.
Level {R,F} [Ch1,Ch2]: the level above baseline for the rising and falling edge. Rising edge is only for Ch1.
Median {1,2} [Ch1,Ch2]: the size of the median filter in stage 1 (Ch2 only) and stage 2 (Ch1 and Ch2).
Pk Dist {1,2} [Ch1,Ch2]: The minimum distance in frames between detected peaks for Ch1 and Ch2, respectively.
Pk Hght [Ch1,Ch2]: the number of times above SD that the detected peak must be, as indicated by the threshold in the plots.
Pk Prom [Ch1,Ch2]: the number of times above SD that the peak has to be above its immediate neighbors.
Note that when single color data is loaded, certain fields are disabled that only influence the second channel. If the program is being run on a version
prior to 2014b, the peak prominence parameter is also disabled; this is because this parameter is not available in
the findpeaks() function. The values in the explore detection interface are NOT transferred to the settings until you press the Apply
button: if you close the window, the modifications will be lost!
When the user has not supplied manual fusing times, the program will attempt to find the start and end of
the fusion event(s) automatically. To do so, it employs a search method using the findpeaks() function
in Matlab. To perform the detection, the employed algorithm goes through the following stages for the
pHluorin data:
calculate the first derivative of the data
zero all negative values
apply median filter to the gradient data
establish a threshold based on the gradient baseline standard deviation
use findpeaks() on the gradient data using the threshold parameters
After the initial detection of the peaks, each peak is verified to see if it meets the criteria set by the user
in the preferences. The event starts at the first point above the threshold level set by the user as a factor times
the local standard deviation of the local baseline. The peak in the gradient is used to calculate the peak fold
values that are shown in the results table in the main interface ("peak fold" column). This value represents how many
times the peak is above the threshold, and is a measure of the detection quality. A value of
1.5 is a good value to discriminate a good detection if the data has stable noise and
baseline levels.
When the start point(s) of fusion have been determined, the next step involves finding the end points of the fusion
event(s). This detection follows the same stages, but looks at the negative gradient of the signal. After detection,
the end points are checked to see if they match the start points: every starting point can only have 1 end point, and
the end point of fusion event 1 cannot be beyond the starting point of event 2. Once this check has been completed, the
remaining fusion end points are corrected for the detection level threshold. Each fusion point will get a quality value
associated with it, which is calculated as the signal-to-noise ratio (SNR) of the peak. This value is derived by taking the peak
value and dividing it by the standard deviation of the baseline. This value is than converted to a value between 0 and 1
using a sigmoid curve (see Figure 2). This value is used for the color coding of the triangles in the main plot and in the results
table in the interface ("quality" column). A value of 0.3 (corresponding to a SNR of ±5) is a
good value to discriminate a good detection.
Figure 2: Sigmoid curve for mapping the peak SNR to a value between 0 and 1. Dotted lines indicate the values
for events with a SNR of 3, 6 and 8 with their associated quality values (0.0312, 0.5 and 0.875, respectively).
If a second channel is present, detection of the falling edge is applied to the second channel as well. The stages are
as follows (with their own preference settings):
apply a first stage median filter on the data
calculate the negative first derivative
zero all negative values in this gradient data
apply a second stage median filter on the gradient
establish the threshold based on a factor of the max value
use findpeaks() on the gradient data using the threshold parameters
After the initial detection, each of the fusion points is verified to see that it falls below the threshold set by the
user. This threshold is calculated in a similar way as the threshold for the first channel.
Figure 3: Explore detection interface, showing three plots (from top to bottom): raw data and a scaled
derivative; first derivative and median filtered derivative with thresholds; and median filtered derivatives for the
falling edge with peak detections and thresholds.
Note that the number of fusion events in the second channel are not matched to the events in the first channel, but are
detected completely independent. The value for the detection threshold can be controlled using the F level parameter. During
classification, this discrepancy is accounted for in terms of classifying events, but the number of events is only based on the
first channel.
Slow event detection
The automatic detection is based on finding a large change in intensity over a short period of time (usually 1 or 2 frames).
However, it sometimes is the case that events in the data set have a much slower onset time (more than 5-10 frames). For
these events, there is a detection method that tries to find the start of slow events. This detection uses 5 different methods
to detect a deviation from the baseline:
Maximum of the first derivative on the smoothed data
Deviation from the median by 4× IQR (interquartile range)
Deviation from the mean by 6× SD
Deviation from the median by 4× IQR using a sliding window
Alarm rate method using a sliding window and deviation from the mean by 3× SD
Each of the methods will return either a location for an event or a NaN value if it cannot find an event. The detection will
find an event if at least 3 of the methods return a location, and the locations are not more than 15 frames apart.After this check,
the time point of fusion is determined as the median of the five methods (excluding the NaN values). The goodness value is a sum
of the 5 detection methods, with a value of 0.05, 0.1, 0.2, 0.25 and 0.3999 for each method, respectively. This detection is applied
automatically when the automatic detection fails.
Back to top
Classification
Classification occurs on two levels: the first level is the event level, where each event is categorized according to a set of parameters. The
second level is the site level, where each site that the user added will be characterized based on the synapse channel data. If no synapse data
is present in the data, the default value will be used.
Event classification
During classification, the fusion events are processed for parametes that can be used to group events in categories.
There are currently 3 main categories and a miscellaneous category. The classification can be adjusted using the
parameters in the preference dialog. The following categories are determined based on parameters set by the user:
Transient events: event duration
Persistent events: event duration, rise time
Slow deacidification: rise time
Unclassified/miscellaneous
The second channel can be used to subdivide the transient and persistent events into subcategories. The classification
module gives labels to each catefory and subcategory based on the parameters set in the preferences. The format of the
classification has three levels, separated by dots: main.sub.release. The numbers have the following meaning,
depending on the level (the "—" indicates that this value is not used for the corresponding level):
value
main
sub
release
1
transient
fast
yes
2
persistent
slow
no
3
slow deacidification
—
—
4
unknown
—
—
9
—
unknown
—
A three digit combination is made for each event based on the classification parameters. If the site contains no events,
the classification will be set to 4.9 without further specification. For example, a fast transient event with release will
be categorized as 1.1.1, while a slow persistent event without release will be categorized as 2.2.2. When there
is no red channel data available, the release value will be set to 2 by default. A few examples of classification are
shown in Figure 6, showing the event in the green channel using colored triangles, and the event in the red channel using an
open circle.
Figure 4: Examples of the main categories: fast transient with release (1.1.1), slow transient with release (1.2.1)
and fast persistent with (2.1.1) and without (2.1.2) release.
There are 5 parameters that can be set for the classification of events. To set them, go to Edit | Preferences and modify
the values in the classification panel (Figure 5).
Figure 5: Preferences for classification.
The first value indicates the maximum duration of a transient event in seconds. Events longer
than this value will become persistent or slow acidification events. The second parameter determines the maximum duration for a fast
transient event in seconds. Values larger than this duration will be classified as slow transient events. The third parameter separates
the persistent events in slow and fast, according to the duration: shorter duration than this parameter will be fast persistent. Please note
that a value for this parameter lower or equal to the first parameter will result in all persistent events being labeled as slow. The fourth
parameter is the max rise time of the event in seconds. This determines whether an event will be classified as a persistent or slow
acidification event. This check is only performed on events that are not considered transient.
Figure 6: The detection points in the first channel (triangles) are used to determine the duration of the event. In case of a
second channel detection (open circle), the time between the end point in the first channel and the detection in the second channel will
be used do determine the release status. The length of this trace represents 10 seconds, the duration is 1 second and the Ch2 delay is 0.5 seconds.
Finally, the fifth parameter determines whether a release event results in the loss of cargo. This parameter requires the second channel to be present
in the data set. The delay of the release relative to the end point of fusion (see Figure 6) can be set to allow precise control over whether an event
has release of the cargo. A long delay means that the decrease in the green channel can be uncoupled from the loss in the red channel, making the event
less likely to be related to the event in the green channel. When this parameter is set to Inf, the entire duration of each green event will be
searched for a release event in the second channel. This means that the max red delay is only limited by the event duration, which can be different for
each event. An example is shown in Figure 7.
Figure 7: When using a Ch2 max delay value of 1 second, this would be classified as a fast persistent event without
release (2.1.2). When the value of Ch2 max delay is set to Inf, this will be a fast persistent event with
release (2.1.1).
In this example, the same categorization can be obtained by setting the value for Ch2 max delay to something like 30 seconds. However, using a
value for Ch2 max delay larger than the actual event will also consider events in channel 2 that occur before the onset of the event. By using
the Inf value, the algorithm will limit the value of Ch2 max delay to the duration of each event.
Back to top
Site classification
If localization data was present in the original data import, classification will also attempt to label each site as a
synaptic or extra-synaptic site. This distinction is based on k-means clustering of the intensity data supplied for each
fusion site. The clustering is performed per cell, and separates the intensity data into 2 clusters. In the preferences,
there is also an option to apply a manual threshold for the intensity. When using the manual option, make sure that the
settings used for acquisition are identical for the localization data: it is only possible to set a single manual
threshold value. When different acquisition settings were used, always disable the manual threshold setting in the
preferences dialog prior to classification. When the results do not match the observations, the user can try to apply a
threshold manually in ImageJ and measure the intensities in a mask image: this will set the areas above the threshold to a
value of 255 and below the threshold to 0 intensity. Using these measurements as a synapse intensity in the localization
of the imported data, will allow the user to set the manual threshold to any value larger than 0 to classify the synaptic
events. Depending on the quality of the mask and threshold, this will give the clearest results. If another marker is used
instead of synapse marker, labelling it as a synapse marker in the protocol sheet will still allow classification of the
localization. If no localization data is present, all sites will be labeled as extra-synaptic, and the thresholding method
has no effect.
If a SynD analysis was performed on the data files using SynD aggregate, it is possible to estimate the pool size and the release probability using the
intermediate save files from SynD (*-save.mat). When the files are available, the calculations for the pool size can be performed selecting
Analysis | Determine pool size (SynD) from the menu. After selecting the relevant SynD files, you will be asked to select the relevant field from
the data extracted. Generally, you will need to select the synapseIntensitySynMean field to get the measurements for vesicle intensities. It is
assumed that the fusion events were located in the synapse channel when the analysis was performed. For information on how to perform SynD analysis, refer
to the SynD documentation. The function will calculate three pool sizes:
uncorrected pool: this is the raw output of SynD
cell-corrected pool: this uses the mode per cell in the first 2 quantiles
total-corrected pool: this uses the mode over all cells
For each of the pools, the corresponding release probability will also be calculated, by dividing the number of events by each type of pool size. The data
will be stored in the data structure. Note that when modifying the data structure by adding or removing events, the release probability needs to be recalculated.
A script called script_extract_release_probability is provided that will allow the data from a session file that contains the release probability and pool
sizes to be exported to a text file for statistical analysis. Export can also be done directly from the main interface, using the
File | Export | Release probability option from the menu. Use the menu option if you want the standard export format, and use the script if you wish to
optimize or include additional calculations to the export routine.