In order to analyze the results provided by the program a few steps need to be taken before the testing can begin. First, make sure that you reviewed the data and deselected all sites and events that do not meet the set criteria. Make sure that you have all your cells in the data set and you examined the plots in the program for any abnormalities. Once everything is in order, the data can be exported to a format suitable for analysis in R or SPSS. This section will mostly deal with analysis in R, although most of the functions can also be found in SPSS. Scripts for analyzing the data produced by the program will be distributed with the program. In this section, only excerpts of the scripts will be shown to highlight certain steps. There will also be a little theoretical background on some of the methods used for the analysis.
Before the data can be analyzed, it is a good idea to verify that the data was correctly exported. When the export function was used in the main interface (File | Export... | SPSS and R), a text file with a .dat extension was saved on a specified location. To load the data into R, the following commands can be used:
01 #load the data by selecting a file ---- 02 fname <- file.choose(); 03 fusion.data <- read.delim(fname); 04This will show a file selection dialog that will allow you to select the data file, which will then be read and stored in the fusion.data data frame. A data frame in R can be seen as a table with every column representing a variable, and every row an observation. To get a quick summary of the data, use the following command:
05 #show summary of the data 06 summary(fusion.data); 07This summary will show you some summary statistics, specifically the number of cases present in each of your conditions. When you exported the data, cases may have been present that did not contain events. For proper analysis, there are two possibilites:
08 #create data frame with incomplete cases removed 09 complete.data <- fusion.data[complete.cases(fusion.data[,5]), ]; 10This command filters the events based on the fuse_time variable, which will contain the value na if there was no detection in that particular site. The next step would be to create a data frame that contains the number of events per cell. To do this, we will use the cell_id and condition variables to determine the number of events per cell and condition. To do so, the following command can be used in R:
11 #determine the number of events per cell 12 library(plyr); 13 count.data <- ddply(complete.data, .(condition,cell_id), summarize, num_events=length(cell_id)); 14To use the ddply command, we need to import the plyr library (see line #12). Then we call the ddply command using the complete.data as the data set to summarize. The summarize command will be applied to the condition and cell_id variables, resulting in the number of events per condition per cell. The results will be stored in a variable called num_events, which will be part of the count.data data frame. The resulting data frame will have one row for each cell in the data set, and three columns: condition, cell_id and num_events. It is important to use the data set without missing values for this command, otherwise the missing values will also be counted as events in the ddply command.
The next step is to plot some of the data to examine the distribution of the data and perform some tests to examine how the data is distributed. To do this, you can simply create box plots of the data (note that you could also do this in the main interface of fusion analysis 2). To create some basic plots in R, you can start with plotting the fusion onset between the different conditions. This can be done with the standard plot function in R:
15 #make basic plots of the data 16 plot(fuse_time~condition, data=complete.data, ylab="Fusion onset (s)") 17 plot(fuse_dura~condition, data=complete.data, ylab="Fusion duration (s)") 18This will result in the generation of two box-and-whisker (B-W) plots, one for the onset and another for the duration. An example script will also be provided to show how to create plots using the ggplot2 library. The B-W plots will show the distribution of the data, and outliers will be shown as circles above and below the whiskers. The plotting using the ggplot2 library can be adjusted more precisely, but for a quick inspection, the standard plot function will do. Making a histogram with the built-in hist function is not possible when using factor, so for this we will use the ggplot2 library:
19 #make a histogram of the onset data ---- 20 library(ggplot2) 21 hist.onset <- ggplot(complete.data, aes(x=fuse_time, color=factor(condition))) + 22 stat_bin(geom="step", binwidth=.5, size=.8, position="identity") + 23 labs(title="Fusion onset", x="Time (s)", y="Nr of Events") + 24 theme_bw() 25 26 hist.dura <- ggplot(complete.data, aes(x=fuse_dura, color=factor(condition))) + 27 stat_bin(geom="step", binwidth=.5, size=.8, position="identity") + 28 labs(title="Fusion duration", x="Time (s)", y="Nr of Events") + 29 theme_bw() 30 31 #show the plots in a grid 32 library(gridExtra) 33 grid.arrange(hist.onset, hist.dura, nrow=2) 34To create the histogram of the duration, the procedure is identical, except the variable used for the x will be fuse_dura on line 21. The command used here to generate the histogram will have the shape of a line with the color dependent on the conditions. The advantage of this type of plot for a histogram is that it becomes easier to compare more than two groups. If you wish to create a standard bar-type histogram, remove the geom="step" from line #22. Two things to note about the use of the ggplot function: first, the construction of the plot occurs by adding layers (geoms) to the object using the + symbol. Second, the construction of the plot object is separate from the display: without line #33 there would be no visible plot. By calling the object, you're instructing R to display the plot in the active graphics port. Lines #23-24 are not required, but will result in labels on the axes and the plot (line #23) and convert the plot to use as little color as possible using the black-and-white theme (line #24). In this example I use the gridExtra library to be able to plot multiple plots in a grid: in this case in 2 rows. Note that this plot is different from the one generated by the Fusion Analysis 2 program: this plot shows all events of all the cells together, while in the program the average per cell is calculated and plotted. If you wish to see all the individual histograms for each cell you can add a single layer to the ggplot function to do this:
facet_wrap(~ cell_id, ncol=6, scales="free_y") +The line has to be inserted after the ggplot() call, and before the call to theme_bw(). This will generate a histogram for each cell (because cell_id is unique for every cell), spread out over 6 columns, and the scale on the Y-axis will be free, meaning that every plot will be scaled differently. This type of plotting is not compatible with the grid.arrange function: it will work, but in most cases the plots will be too small for viewing. Please note that the plots will not be spread out over multiple windows, so if you have a large number of cells the indivdual plots will become very small! For this type of overview you can use the plotting function in the main interface.
As a formality, we will test whether the data is normally distributed (which is quite unlikely), and secondly we will look if the variances are normally distributed. The latter check can be useful in deciding which tests can be used on the data and which assumptions can be used. To test for normality of the onset and duration, we have several tests available to check if the data is normally distributed. The standard test is the Shapiro-Wilk test, however in R this test can only be used when there are less than 5000 observations. In many data sets, these numbers can be easily reached, especially when multiple detection was selected. There are two main alternatives that are commonly used: the Kolmogorov-Smirnov (KS) test and the Anderson-Darling (AD) test. The KS test is not recommended for the type of data that is produced by the program, as it will contain many ties (i.e. identical values). It is therefor best to use the AD test, as shown below:
35 #load the library for the AD test 36 library(nortest) 37 ad.test(complete.data$fuse_time); 38 ad.test(complete.data$fuse_dura); 39 40 #plot the Q-Q plots for visual inspection 41 qqnorm(complete.data$fuse_time) 42 qqnorm(complete.data$fuse_dura) 43Please note that the AD test can deal with missing data, so the fusion.data data frame could also have been used. If the AD test has a p value less than .05, we must conclude that the data is not normally distributed. Furthermore, by looking at the Q-Q plots, if the points deviate from the diagonal in a non-linear fashion, we can also be sure that the data is not normally distributed. Due to the method of measuring the onset and duration of fusion events, it is highly unlikely that those parameters are normally distributed: both variables will likely be skewed heavily, and both are truncated due to the finite measurement time. Many analyses and tests are robust enough to allow the assumption of normality to be violated without becoming invalid (e.g. ANOVA). Another aspect of the data is however more important: a constant variance. To check this, there are two types of test available to test the homogeneity of variance: the Bartlett test (parametric) and the Fligner-Killeen test (non-parametric). As was seen in the AD test, the data is (likely) not normally distributed, so we will stick with the non-parametric test:
44 #Test for homogeneity of variance 45 fligner.test(fuse_time~condition, data=complete.data) 46 fligner.test(fuse_dura~condition, data=complete.data) 47A significant result for this test will cause a rejection of the null hypothesis that the variance is homogeneous. When this is the case, using the ANOVA family of tests for final analysis is not recommended, which only leaves the non-parametric tests (i.e. Welch, Brown & Forsythe or Kruskal-Wallis) or using the mixed effects (LME) and generalized linear models (GLM).
After examining the data and checking the assumptions for tests such as the ANOVA or regression, you can proceed to the testing of hypotheses. One of the main tests will be to see if there is a difference between groups, identified by the condition variable. Assuming that the Fligner-Killeen tests where not significant, you can proceed with using regression or ANOVA to test for differences between the groups. Otherwise, you could use the non-parametric test such as the Kruskal-Wallis test.
If you have more than two groups, a one-way ANOVA will test whether the null hypothesis that the groups have equal means: μ0=μ1=…μn. A two-way ANOVA will allow you to examine which group(s) differ. Regression can be performed on the data in a similar way, with similar assumptions applying (see Field, 3rd edition, section 7.6.2.1).
For completeness sake, the following section will very briefly show how to do a linear regression on the data that can be obtained from the fusion analysis output. Please keep in mind that in most cases, the assumptions that apply to linear regression will not hold for this type of data! To perform a linear regression, you can use the following commands in R (for an ANOVA, replace the lm command with aov):
48 #Perform a linear regression 49 mdl.1t <- lm(fuse_time ~ condition, data=complete.data) 50 mdl.1d <- lm(fuse_dura ~ condition, data=complete.data) 51 summary(mdl.1t) 52 summary(mdl.1d) 53In data sets that contain a large number of events in total and per cell, this type of analysis will almost always generate significant effects. The most important assumption that is violated when using this method of testing is the independence of the data points: it is more likely that events from within a cell will be more similar than between different cells (even from the same condition). It is therefor better to perform a multilevel analysis to correct for this problem using mixed effects models, or use survival analysis for the fusion onset time.
When the tests for homogeneity of variance indicated significant deviation, testing hypotheses can be done using rank-based tests such as the Kruskal-Wallis test. The remaining assumption that applies for using a non-parametric test is that the data points are not correlated, which in many cases does not strictly hold: for the event data, it is very likely that events within 1 cell will have more in common than between different cells. However, the count data is more likely to have uncorrelated observations. The basic testing procedure for the count data will be shown in the next example, although the commands for testing the other parameters are similar.
54 # Perform the Kruskal-Wallis test 55 kw.t <- kruskal.test(num_events ~ condition, data=count.data) 56
If there are more than 2 conditions in your data set, you may also want to look at the comparisons between the control condition and the experimental conditions. In statistics packages, these comparisons are called contrasts, and R uses the experimental contrasts by default, which result in comparisons being made against a single control. When you call the contrasts function on the complete.data$condition variable, you will get a table with the comparisons that will be made. You can generate the contrasts manually if you wish to make more elaborate comparisons, or when you want to group certain conditions together. Once you have created your contrasts, or want to use the ones that were set, you can use the glht function from the multcomp library to make comparisons between the different conditions. To do this, use the following commands:
57 # Perform multiple comparisons ---- 58 library(multcomp) 59 glht(mdl.1d, mcp(condition="Tukey")) 60This will perform a multiple comparison based on the condition variable, with the type set to "Tukey". The comparison type can be set to Tukey for pairwise comparisons, while setting it to "Dunnett" will perform a many-to-one comparison. To perform a more complicated comparison it is possible to group conditions together to make multiple comparisons. The following example assumes there are 4 conditions: CTRL, EXP1, EXP2 and EXP3, with the following comparisons being of interest:
# create comparison matrix mcp.mtrx <- rbind("CTRL - EXP12" = c(2 -1 -1 0), "CTRL - EXP3" = c(1 0 0 -1), "EXP12 - EXP3" = c(0 -1 -1 2))The sum of the contrasts on each row should add up to 0, so in comparisons 1 and 3, we need to give the single condition a value of 2 to make sure that the total adds up to 0. Note that the multiple comparisons using the glht function can only be made on objects such as lm, glm or aov. Also note that when you supply the contrasts manually, only the contrasts that are supplied are actually tested: in the example above, there will be no comparison between CTRL and EXP1 or EXP2 as they were not defined in the matrix.
To perform multiple comparisons on non-parametric tests such as the Kruskal-Wallis test described above, we need to use a different library and function. The Dunn test from the dunn.test library supports different methods for adjusting the p values. The default adjustment is "none", meaning no correcting for multiple comparisons. The bonferroni and holm methods are used to correct the family-wise error rate (FWER; type I errors), while bh, or Benjamini-Hochberg, corrects for the false discovery rate (FDR; type I errors) but is less stringent than the Bonferroni method.
61 #perform multiple comparisons on non-parametric tests 62 library(dunn.test) 63 dunn.test(complete.data$fuse_dura, complete.data$condition, method="bonferroni", list=TRUE) 64If you have long condition names, it can be useful to use the list option of the dunn.test and set it to TRUE: this will give the results in the form of a list. Note that the Dunn test functionality in R does not allow you to modify the comparisons by using a self-designed comparison matrix. To get specific comparisons, you need to either create a new condition or grouping variable with the specific experimental groups combined, or take a subset of the data that excludes the unwanted experimental group(s). To create a new grouping variable based on selected contrasts, you could use the following:
#create the same contrasts as described in the parametric testing library(car) complete.data$contrast.condition <- recode(complete.data$condition, "'CTRL'='CTRL'; 'EXP1':'EXP2'='EXP12'; 'EXP3'='EXP3'") dunn.test(complete.data$fuse_dura, complete.data$contrast.condition, method="bonferroni")In this example we use the same experimental contrasts that we used in the parametric example, but we created a new variable in the data frame called contrast.condition. To do the multiple comparisons with the new contrasts, we just replace the grouping condition complete.data$condition from the previous example with the new grouping condition complete.data$contrast.condition, as shown on the last line of the previous example.
As the data from the fusion analysis most likely violates the core assumptions for ANOVA and regression, the most useful way of testing for differences between conditions is to use a multilevel analysis using the functions provided by the nlme library (or alternatively, the lme4 library). The data as exported by the program is suitable for nesting the individual events within each cell per condition. To do this we can fit the following mixed effect model in R using the lme() function:
54 #Load the library for mixed effects 55 library(nlme) 56 57 #Perform mixed effect modeling with nesting 58 mdl.2t <- lme(fuse_time ~ condition, random=~1|cell_id/id, data=complete.data, method="ML") 59 mdl.2d <- lme(fuse_dura ~ condition, random=~1|cell_id/id, data=complete.data, method="ML") 60 summary(mdl.2t) 61 summary(mdl.2d) 62The nesting is used in the random effects portion of the model, and is written as cell_id/id, which is interpreted as the site (id) within a cell (cell_id). This will result in all the sites within a cell to be grouped in the model, which are then compared to the other cell groups. The results will show the intercept value, which is the estimated parameter for the baseline condition, and the estimated parameters for the other conditions. The conditions in R are sorted by alphabetical order, and the comparisons are always relative to the first level. To change the level that will be used as control for the comparisons, you can add the following line somewhere after loading of the data (line #03):
fusion.data$condition <- relevel(fusion.data$condition, "wt")If you add it directly after line #03, all the other commands will use the new reference level. Note that the level indicated in this example is "wt", but this could be different for the current data set, and also the relevel command is case-sensitive! This model will compare the mean onset or duration per condition and will indicate whether two or more groups are different. It is also possible to use the time directly as a variable for a mixed effects model. The important thing to realize is that there will be relatively high correlation between neighboring time points, while data further apart in time will have less correlation. If you wish to include the time point in your model, you will need to create a variable in your data set that encodes the time point. Furthermore, you will need to fit a model that includes the correlation model that your data may assume. The most obvious choices are the AR (autoregression) and MA (moving average) correlation structures. For further information on this topic, see Pinheiro and Bates[2] or Gałecki and Burzykowski[3]. To add the correlation structure, just add the correlation argument to your model, and supply the correlation structure that needs to be used by defining the corARMA() function with the parameter p for an AR model and q for an MA model. You can also define both parameters for an autoregressive-moving average model.
mdl.2t.AR <- lme(fuse_time ~ condition, random=~1|cell_id/id, data=complete.data, correlation=corARMA(p=2), method="ML")If you wish to include the time component for the onset, the program allows the export of the data in a survival format, which restructures the data in the time-to-event format. For this type of analysis, see the section on survival analysis further ahead.
After fitting the model to your data, it is necessary to check the quality of the fitted model. To do this, a plot of the residuals (the difference between the model and the data) should be made to see if there are any problems. The first type of plot to make is the residuals per condition to see if there are problems with the distribution of the residuals. To do so, use the following commands:
63 #Load the latiice library for bwplot 64 library(lattice) 65 66 #make plots for the residuals 67 plot(mdl.2t) 68 bwplot(resid(mdl.2t, type="pearson") ~ condition, data=complete.data) 69 qqnorm(mdl.2t, ~resid(.) | condition) 70If the residuals in the first plot are not scattered evenly but have a certain shape (i.e. they spread out), the model may not correctly describe the data. If the B-W plot also shows many outliers in one or more of your groups, this would also be a cause for concern. Finally, the Q-Q plot should not deviate from a line: any non- linearity in this plot means the residuals are not normally distributed. If this is the case, it might help to change the weights of the model like so:
71 #Update the model with weights and test the model fit 72 mdl.2tw <- update(mdl.2t, weights=varPower()) 73 anova(mdl.2t, mdl.2tw) 74The call to anova will compare the two models using the Log-Likelihood test: if the comparison is signifcant then the model with the adjusted weights will fit the data better. If the model mdl.2tw is better, check if the residuals are more evenly spread by using the plot function. If the residuals still look skewed, a different type of model might be required to determine the differences between conditions. For the fusion onset, take a look at the section on survival analysis for an alternative method for modeling the data.
In the beginning we created an object that holds the number of events per cell per condition (count.data). This type of data was not tested for normality, but follows the same procedures as the other data. Given the nature of the data, it is highly unlikely that this data is normally distributed. When plotting the box plot for the number of events in the main interface using Plots | Box plots... | Nr of events, you will see that the data normally clusters into groups of cells that have many events and those that have very few events. To model this data, we will use a different model that fits a generalized linear model to the data stored in the count.data data frame. To do this we call the glm function included in the stats library.
75 #Fit a GLM to the count data 76 mdl.3 <- glm(data=count.data, num_events ~ condition, family=quassipoisson()) 77 mdl.4 <- gls(data=count.data, num_events ~ condition) 78 summary(mdl.3) 79The family argument used in the call to the glm function determines what type of error distribution is used when modeling the data. In this example the quassipoisson family is used, as count data has a tendency to be poisson distributed. Event data, however, tends to be somewhat poisson-like but not entirely: poisson distributed data has a very strong peak for low number of events, which tapers off very rapidly for higher counts. As the box plots might have shown, this is usually not the case for fusion event numbers. Alternatively, you could use the gls function from the nlme library, which fits a linear model to the data. Checking the model for fit quality is done in a similar way as before:
80 #make plots for the residuals 81 plot(mdl.3) 82 bwplot(resid(mdl.3, type="pearson") ~ condition, data=count.data) 83 qqnorm(mdl.3, ~resid(.) | condition) 84The same arguments for checking the model apply as before, with an even scattering of the residuals indicating a proper choice for the type of the model. Other model types that are available are Gamma, gaussian, inverse.gaussian and poisson. The binomial families are not suited for this type of data, and should only be used for data concerning proportions (i.e. values in the range 0…1).
Survival analysis is a term used for time-to-event data, which is characterized by a designated time at which an event occurs, but can also contain information on subjects or sites that do not have an event during the measurement time. The latter events are called right censored, but they contribute information to the time-to-event data. Events can be measured up to the first event or, with small modifications to the data structure, analysed as multiple recurrent events.
The model that describes the survival data in the most flexible way is the Cox proportional hazards model (Cox, 1972). Several extensions of the basic model exist to allow for single events, recurring events and the use of co-variates to accurately model the data. The main parameter estimated from a Cox model is the hazard rate. This measures the rate at which events occur at any given time during the experimental duration.The Cox model is available in R using the survival library. In order to allow for multiple events, the data can be divided in strata according to 2 different strategies: Anderson & Gill (AG) and Prentice, Williams & Peterson (PWP). The AG method models the events as if they were independent events, and can be used to measure the recurrence rate of events. The PWP method is more flexible and can be used to model events individually and allow for correlation between events. This model is a conditional model, where the number of events preceding an event influences the hazard rate.
When exporting the data for survival analysis, the program will ask for a file name, and then will process the data for survival analysis. The first step involves getting all the event onset times and formatting them in the (start, stop, event) format. This format consists of 2 columns that represent the times for the events: the start column has the value of the start of an episode, defined as either 0 for the first event or otherwise the time of the previous event. The stop column contains the time of the events and ends with the time of the ammonium application (as defined in the protocol). The event column specifies whether an event occurred (value is 1) or not (value is 0). By default, the last start and stop pair that includes the ammonium response has no event (the event column has a value of 0). In order to perform the multiple events analysis 2 columns labeled with _strata are added to the output: one can be used for the AG-style analysis (ag_strata), the other is meant for the PWP-style analysis (pwp_strata).
The 4 columns before the (start,stop,event) data allow for different types of models: the first column just has a unique identifier to make sure that every row can be identified. The second row allows for a frailty model, to account for variation between cells, while the site ID column can do the same but then for individual sites. The group column identifies the group label that the user applied to the cells. This can be used to compare the hazard rate between multiple groups (e.g. wild type versus knockout).
Several co-variates are added behind the core data, that will allow a more exact model and to discover which parameters are influential in modeling the hazard rate. The included parameters are as follows:
This section provides a brief overview of how to implement the analysis in R using the exported data. Assuming the data is loaded as a data frame surv.data in R, the following is a brief example of how to analyse the data. Please keep in mind that this only covers the basics of the analysis, and whenever a statistical model is fitted to data, verification of the assumptions and quality of the model still need to be tested.
The first step is to enable the survival library in R. If it is not installed by default, first install it and then enable it. Once the library is loaded, the Cox model can be fitted using either of the following commands:
01 # library and file loading 02 library(survival); 03 datFile <- file.choose(); 04 surv.data <- read.delim(datFile); 05 06 # fitting the basic model 07 coxAG.m <- coxph(Surv(start,stop,event) ~ group + strata(ag_strata), data=surv.data); 08 coxPWP.m <- coxph(Surv(start,stop,event) ~ group + strata(pwp_strata), data=surv.data); 09
The first command on line 7 will fit the model for recurrent events using the Anderson & Gill stratified data, while the one on line 8 will fit it according to the Prentice, Williams & Peterson stratified data. If the data has more than 1 group, the first group in alphabetical order will be set as 0, and all the comparisons in hazard rates are compared to this group. Make sure that you re-level the group variable so that your control is the 0-level. To look at the results of the fitting, use the summary command on the fitted model(s):
10 # show a summary of the models 11 summary(coxAG.m); 12 summary(coxPWP.m); 13
The output will show the estimated hazard rate (exp(coef)) and the confidence interval for that rate: if 0 is part of the 95% confidence interval than the hazard rate is not different from 0, indicating that there is no additional hazard compared to the reference group. When adding co-variates, make sure to maintain the hierarchical model structure; this way you can compare whether adding additional co-variates improve the model significantly or not. To check whether addition of co-variates improve the model, the log likelihood test can be used to determine whether the more complex model is significantly better or not.
Adding co-variates is done by adding terms in the formula to indicate the parameter to be included in the model. To investigate the effect of synaptic localization on the hazard rate, a model can be fitted using the following command:
14 # additional moddels with a co-variate included 15 coxAGsyn.m <- coxph(Surv(start,stop,event) ~ group + cov_synaptic + strata(ag_strata), data=surv.data); 16 coxPWPsyn.m <- coxph(Surv(start,stop,event) ~ group + cov_synaptic + strata(pwp_strata), data=surv.data); 17 18 # and models with frailty included for sites 19 coxAGsynFr.m <- coxph(Surv(start,stop,event) ~ group + cov_synaptic + strata(ag_strata) + frailty(site_id), data=surv.data); 20 coxPWPsynFr.m <- coxph(Surv(start,stop,event) ~ group + cov_synaptic + strata(pwp_strata) + frailty(site_id), data=surv.data); 21 22 # check if the frailty improved the model: log likelihood test 23 anova(coxAGsyn.m, coxAGsynFr.m); 24 anova(coxPWPsyn.m, coxPWPsynFr.m); 25
This will result in a new model with the synaptic localization added to the model. The hazard rate for the synaptic parameter will be compared to the extra-synaptic hazard rate, which is set to 0 by default. A significant value for the synaptic parameter would indicate that the hazard rate for synaptic events is higher (the coefficient for the synaptic parameter is positive) or lower (the coefficient is negative). Finally we compare the same model without and with frailty to see if adding the frailty term improves the model. A significant result indicates that the model with the frailty added is a better model. Also, the values of AIC and BIC reported by this test will indicate which model is the best: the model with the lowest value is the best model.
The following is a source of reading material on the selected topics on this page.
[1] | The R Book – Michael J. Crawly. Wiley & Sons (2007) ⇒ General text on R, data analysis and testing using R. [free downloadable] |
[2] | Mixed-Effects Models in S and S-PLUS – José C. Pinheiro and Douglas M. Bates. Springer Verlag (2000) ⇒ Text about mixed-effects models as used in the nlme library within R. [free downloadable] |
[3] | Linear Mixed-Effects Models Using R – Andrzej Gałecki and Tomasz Burzykowski. Springer Verlag (2013) ⇒ Text about mixed-effects models using R with theoretical background and examples. [book] |
[4] | Discovering Statistics Using R – Andy Field. SAGE Publications (2009) ⇒ General text on statistics and modeling with implementations in SPSS and R. [book/e-book] |
[5] | Modelling Survival Data in Medical Research – David Collett. CRC Press (2015) ⇒ Book on the theoretical foundation of survival analysis and its application in medical research. [e-book] |
[6] | Regression Models and Life-Tables – David R. Cox. J Royal Stat Soc. Series B, 34(2), pp. 187-220 (1972) ⇒ Publication on time-to-event data and hazard functions, basis for the Cox proportional hazard model. [article] |