Res
Jones, 2019
t-tests
This workshop will cover 3 types of t-test, the one sample, two sample and paired t-tests. It
will cover when to use each test, the assumptions of each test, how to perform the test and
how to accurately report each test. Work through each t-test, making sure to understand
what each line of code does. In some places, code is not provided as you have covered this
already e.g. reading in your data and producing histograms. It is vital that you understand and
can perform these statistical tests yourselves, as you will be alone when undertaking the exam
and will be expected to perform statistical tests.
As you will have discovered, there are many ways of doing the same thing in R e.g. reading in
data and subsetting, so there is no ‘right’ answer. Please just use the method that works for
you and which you understand. However, there are specific ways of reporting your results
and performing these tests. For more information on t-tests, please refer to sections 9.4-9.6
of the core text book.
For this workshop, I have highlighted code in grey and things you have to do in red. It is also
important you read through the background material too, as this may be examined.
This workshop is based on material by Childs (2018).
Jones, 2019
One sample t-test
When do we use one-sample t-test?
The one-sample t-test is the simplest of statistical tests. It is used in situations where we have
a sample of numeric variable from a population, and we need to compare the population
mean to a particular value. The one-sample t-test uses information in the sample to evaluate
whether the population mean is likely to be different from this value. The expected value
might be something predicted from theory, or some other prespecified value we are
interested in. Here are a couple of examples:
• We have a theoretical model of foraging behaviour that predicts an animal should
leave a food patch after 10 minutes. If we have data on the actual time spent by 25
animals observed foraging in the patch, we could test whether the mean foraging time
is significantly different from the prediction using a one-sample t-test.
• We are monitoring sea pollution and have a series of water samples from a beach. We
wish to test whether the mean density of faecal coliforms (bacteria indicative of
sewage discharge) for the beach can be regarded as greater than the legislated limit.
A one-sample t-test will enable us to test whether the mean value for the beach as a
whole exceeds this limit.
How does the one-sample t-test work?
Imagine we have taken a sample of a variable (called ‘X’) and we want to evaluate whether
the mean is different from some number. Here’s an example of what these data might look
like, assuming a sample size of 50 was used:
Figure 1: Example of data used in a one-sample t-test
Jones, 2019
The red line shows the sample mean, and the blue line shows the expected value (this is 10,
so this example could correspond to the foraging study mentioned above). The observed
sample mean is about one unit larger than the expected value. The question is, how do we
decide whether the population mean is really different from the expected value? Perhaps the
difference between the observed and expected value is due to sampling variation. Here’s how
a frequentist tackles the question:
• We have to first set up an appropriate null hypothesis, i.e. an hypothesis of ‘no effect’
or ‘no difference’. The null hypothesis in this instance is that the population mean is
equal to the expected value.
• We then have to work out what the sampling distribution of the mean looks like under
this null hypothesis. This is the null distribution. We use the null distribution to assess
how likely the observed result is under the null hypothesis.
The new idea is that now we will make an extra assumption. The key assumption of onesample t-test that the variable is normally distributed in the population. The distribution
above look roughly bell-shaped, so it seems plausible that it was drawn from a normal
distribution.
Now, because we’re prepared to make the normality assumption, the whole process of
carrying out the statistical test is very simple. The consequence of the normality assumption
is that the null distribution will have a known mathematical form—it’s related to the tdistribution. We can use this knowledge to construct the test of statistical significance. But
instead of using the whole sample, as we did with bootstrapping, we only need three pieces
of information to construct the test: the sample size, the sample variance, and the sample
means. No resampling of data is involved.
So how does a one-sample t-test it actually work? It is carried out as follows:
Step 1. Calculate the mean. That’s simple enough. This is our ‘best guess’ of the unknown
population mean. However, its role in the one-sample t-test is to allow us to construct a test
statistic in the next step.
Step 2. Estimate the standard error of the sample mean. This gives us an idea of how much
sampling variation we expect to observe. The standard error doesn’t depend on the true value
of the mean, so the standard error of the sample mean is also the standard error of any mean
under any particular null hypothesis. This step boils down to applying a simple formula
involving the sample size and the standard deviation of the sample:
Jones, 2019
…where s2 is the square of the standard deviation (the sample variance) and n is for the
sample size. The standard error of the mean gets smaller as the sample sizes grows or the
sample variance shrinks.
Step 3. Calculate a ‘test statistic’ from the sample mean and standard error. We calculate this
by dividing the sample mean (step 1) by its estimated standard error (step 2):
Why is this useful? If our normality assumption is reasonable this test-statistic follows a tdistribution. This is guaranteed by the normality assumption. So this particular test statistic is
also a t-statistic. That’s why we label it t. This knowledge leads to the final step…
Step 4. Compare the t-statistic to the theoretical predictions of the t-distribution to assess
the statistical significance of the difference between observed and expected value. We
calculate the probability that we would have observed a difference with a magnitude as large
as, or larger than, the observed difference, if the null hypothesis were true. That’s the p-value
for the test.
We could step through the actual calculations involved in these steps in detail, using R to help
us, but there’s no need to do this. We can let R handle everything for us. But first, we should
review the assumptions of the one-sample t-test.
Assumptions of the one-sample t-test
There are a number of assumptions that need to be met in order for a one-sample t-test to
be valid. Some of these are more important than others. We’ll start with the most important
and work down the list in order of importance:
1. Independence. People tend to forget about this one. We’ll discuss the idea of
independence later when we consider principles of experimental design. For now, we
just need to state why the assumption matters: if the data are not independent the pvalues generated by the one-sample t-test will smaller than they should be.
2. Measurement scale. The variable being analysed should be measured on an interval
or ratio scale, i.e. it should be a numeric variable. It doesn’t make much sense to apply
a one-sample t-test to a variable that isn’t measured on one of these scales.
3. Normality. The one-sample t-test will only produce completely reliable p-values if the
variable is normally distributed in the population. This assumption is less important
than many people think. The t-test is fairly robust to mild departures from normality
when the sample size is small, and when the sample size is large the normality
assumption matters even less.
Jones, 2019
We don’t have the time to properly explain why the normality assumption is not too
important for large samples, but we will at least state the reason: it is a consequence of
something called the ‘central limit theorem’.
How do we evaluate these assumptions? The first two are really aspects of experimental
design, i.e. we can only evaluate them by thinking carefully about how the data were gathered
and what was measured. What about the 3rd assumption? One way to evaluate the normality
assumption is by plotting the sample distribution using something like a histogram or a dot
plot. If the sample size is small, and the sample looks approximately normal when we visualise
its distribution, then it is probably fine to use the t-test. If we have a large sample we don’t
need to worry much about moderate departures from normality. It’s hard to define what
constitutes a ‘large’ sample, but 100s of observations would often be safe.
Carrying out a one-sample t-test in R
We’re going to use the plant morph example to learn how to carry out a one-sample t-test in
R. The data were ‘collected’ to compare the mean dry weight of purple and green morphs.
This question can’t be tackled with a one-sample t-test. Instead, let’s pretend that we have
unearthed a report from 30 years ago that found the mean size of purple morphs to be 710
grams. We want to evaluate whether the mean size of purple plants in the contemporary
population is different from this expectation, because we think they may have adapted to
local conditions.
Read the data in MORPH_DATA.CSV into an R data frame, giving it the name morph.data. We
only need the purple morph data for this example, so we need to subset the data to get hold
of only the purple plants. Call this morph.purple.
Next, we need to explore the data. Perform a histogram on the purple morph weight data to
test the normality assumption. You have learnt how to do this in the data visualisation
workshop.
Carrying out the test
It is fairly straightforward to carry out a one-sample t-test in R. The function we use is
called t.test (no surprises there). We read the data into a data frame called morph.data, then
subsetted this for purple only. This has two columns: Weight contains the dry weight biomass
of purple plants, and Colour is an index variable that indicates which sample (plant morph) an
observation belongs to. We don’t need the Colour column at this point.
Here’s the R code to carry out a one-sample t-test:
t.test(morph.purple$Weight, mu = 710)
Jones, 2019
We have suppressed the output because we want to first focus on how to use t.test function.
We have to assign two arguments to control what the function does:
1. The first argument (morph.purple$Weight) is simply a numeric vector containing the
sample values. We can’t give t.test a data frame when doing a one-sample test.
Instead, we have to pull out the column we’re interested in using the $ operator, in
this case weight.
2. The second argument (called mu) sets the expected value we want to compare the
mean to, so mu = 710 tells the function to compare the mean to a value of 710. This
can be any value we like, depending on the question we’re asking.
That’s it for setting up the test. Let’s take a look at the output:
The first line tells us what kind of t-test we used. This says: One Sample t-test. So we know
that we used the one-sample t-test. The next line reminds us about the data. This says: data:
morph.data$Weight, which is R-speak for ’we compared the mean of the Weight variable to
an expected value. Which value? This is given later.
The third line of text is the most important. This says: t = 3.1811, df = 76, p-value = 0.002125.
The first part of this, t = 3.1811, is the test statistic, i.e. the value of the t-statistic. The second
part, df = 76, summarise the ‘degrees of freedom’. This is essentially a measure of how much
power our statistical test has (see the box below). The third part, p-value = 0.002125, is the
all-important p-value.
The p-value indicates that there is a statistically significant difference between the mean dry
weight biomass and the expected value of 710 g (p is less than 0.05). Because the p-value is
less than 0.01 but greater than 0.001, we report this as ‘p < 0.01’. Read through
the hypotheses and p-values document for more information here.
Jones, 2019
Don’t ignore the fourth line of text (alternative hypothesis: true mean is not equal to 710).
This reminds us what the alternative to the null hypothesis is (H1). It tells us what expected
value was used in the test (710).
The next two lines show us the ‘95% confidence interval’ for the difference between the
means. We don’t really need this information, but we can think of this interval as a rough
summary of the likely values of the true mean. In reality, a confidence interval is more
complicated than that.
The last few lines summarise the sample mean. This is useful to see the mean of your data
and direction of the test i.e. is the mean of the purple morph data higher or lower than 710?
Summarising the result
Having obtained the result we need to write the conclusion. Remember, we are testing a
scientific hypothesis, so always go back to the original question to write the conclusion. In this
case the appropriate conclusion is:
The mean dry weight biomass of purple plants is significantly different from the expectation
of 710 grams (t = 3.18, d.f. = 76, p < 0.01).
This is a concise and unambiguous statement in response to our initial question. The
statement indicates not just the result of the statistical test, but also which value was used in
the comparison. It is sometimes appropriate to give the values of the sample mean in the
conclusion:
The mean dry weight biomass of purple plants (767 grams) is significantly different from the
expectation of 710 grams (t = 3.18, d.f. = 76, p < 0.01).
Jones, 2019
Notice that we include details of the test in the conclusion. However, keep in mind that when
writing scientific reports, the end result of any statistical test should be a conclusion like the
one above. Simply writing t = 3.18 or p < 0.01 is not an adequate conclusion.
There are a number of common questions that arise when presenting t-test results:
1. What do I do if t is negative? Don’t worry. A t-statistic can come out negative or
positive, it simply depends on which order the two samples are entered into the
analysis. Since it is just the absolute value of t that determines the p-value, when
presenting the results, just ignore the minus sign and always give t as a positive
number.
2. How many significant figures for t? The t-statistic is conventionally given to 3
significant figures. This is because, in terms of the p-value generated, there is almost
no difference between, say, t = 3.1811 and t = 3.18.
3. Upper or lower case The t statistic should always be written as lower case when
writing it in a report (as in the conclusions above). Similarly, d.f. and p are always best
as lower case. Some statistics we encounter later are written in upper case but, even
with these, d.f. and p should be lower case.
4. How should I present p? There are various conventions in use for presenting p-values.
Have a look at the hypotheses and p values document. Learn them! It’s not possible
to understand scientific papers or prepare reports properly without knowing these
conventions
Jones, 2019
Two-sample t-test
When do we use a two-sample t-test?
The two-sample t-test is used to compare the means of a numeric variable sampled from two
independent populations. The aim of a two-sample t-test is to evaluate whether or not this
mean is different in the two populations. Here are two examples:
• We’re studying how the dietary habits of Scandinavian eagle-owls vary among
seasons. We suspect that the dietary value of a prey item is different in the winter and
summer. To evaluate this prediction, we measure the size of Norway rat skulls in the
pellets of eagle-owls in summer and winter, and then compare the mean size of rat
skulls in each season using a two-sample t-test.
• We’re interested in the effect of a drug on weight loss in separate groups of patients.
Participants were given either drug A or drug B and after two weeks they were
weighed. They were not weighed before the treatments so we have two means for
each separate group of participants for those that took drug A vs drug B. We would
then compare the mean weights of participants in each group using a two-sample ttest.
How does the two-sample t-test work?
Imagine that we have taken a sample of a variable (called ‘X’) from two populations, labelled
‘A’ and ‘B’. Here’s an example of how these data might look if we took a sample of 50 items
from each population:
Figure 2: Example of data used in a two-sample t-test
Jones, 2019
The two distributions overlap quite a lot. However, this particular observation isn’t all that
relevant here. We’re not interested in the raw values of ‘X’ in the two samples. It’s the
difference between the means that matters. The red lines are the mean of each sample—
sample B obviously has a larger mean than sample A. The question is, how do we decide
whether this difference is ‘real’, or purely a result of sampling variation?
Using a frequentist approach, we tackle this question by first setting up the appropriate null
hypothesis. The null hypothesis here is that there is no difference between the population
means. We then have to work out what the ‘null distribution’ looks like. Here, this is sampling
distribution of the differences between sample means under the null hypothesis. Once we
have the null distribution worked out we can calculate a p-value.
The important difference is that we have to make an extra assumption to use the twosample t-test. We have to assume the variable is normally distributed in each population. If
this assumption is valid, then the null distribution will have a known form, which is closely
related to the t-distribution.
We only need to use a few pieces of information to carry out a two-sample t-test. These are
basically the same quantities needed to construct the one-sample t-test, except now there
are two samples involved. We need the sample sizes of A and B, the sample variances of A
and B, and the estimated difference between the sample means. That’s it.
How does it actually work? The two-sample t-test is carried out as follows:
Step 1. Calculate the two sample means, then calculate the difference between these
estimates. This estimate is our ‘best guess’ of the true difference between means. As with the
one-sample test, its role in the two-sample t-test is to allow us to construct a test statistic.
Step 2. Estimate the standard error of the difference between the sample means under the
null hypothesis of no difference. This gives us an idea of how much sampling variation we
expect to observe in the estimated difference, if there were actually no difference between
the means.
There are a number of different options for estimating this standard error. Each one makes a
different assumption about the variability of the two populations. Which ever choice we
make, the calculation always boils down to a simple formula involving the sample sizes and
sample variances. The standard error gets smaller when the sample sizes grow, or when the
sample variances shrink. That’s the important point really.
Step 3. Once we have estimated the difference between sample means and its standard error,
we can calculate the test statistic. This is a type of t-statistic, which we calculate by dividing
the difference between sample means (from step 1) by the estimated standard error of the
difference (from step 2):
Jones, 2019
This t-statistic is guaranteed to follow a t-distribution if the normality assumption is met. This
knowledge leads to the final step…
Step 4. Compare the test statistic to the theoretical predictions of the t-distribution to assess
the statistical significance of the observed difference. That is, we calculate the probability that
we would have observed a difference between means with a magnitude as large as, or larger
than, the observed difference, if the null hypothesis were true. That’s the p-value for the test.
We could step through the various calculations involved in these steps, but there isn’t much
to be gained by doing this. The formula for the standard two-sample t-test and its variants are
summarised on the t-test Wikipedi page. There’s no need to learn these—we’re going to let
R to handle it again.
Let’s review the assumptions of the two-sample t-test first…
Assumptions of the two-sample t-test
There are several assumptions that need to be met for a two-sample t-test to be valid. These
are basically the same assumptions that matter for the one-sample version. We start with the
most important and work down the list in decreasing order of importance:
1. Independence. Remember what we said in our discussion of the one-sample t-test. If
the data are not independent, the p-values generated by the test will be too small,
and even mild non-independence can be a serious problem. The same is true of the
two-sample t-test.
2. Measurement scale. The variable that we are working with should be measured on an
interval or ratio scale. It makes little sense to apply a two-sample t-test to a variable
that isn’t measured on one of these scales.
3. Normality. The two-sample t-test will produce exact p-values if the variable is
normally distributed in both populations. Just as with the one-sample version, the
two-sample t-test is fairly robust to mild departures from normality when the sample
sizes are small, and this assumption matters even less when the sample sizes are large.
How do we evaluate the first two assumptions? As with the one-sample test, these are
aspects of experimental design—we can only evaluate them by thinking about the data. The
normality assumption may be checked by plotting the distribution of each sample. The
simplest way to do this is with histograms or dot plots. Note that we have to examine the
distribution of each sample, not the combined distribution of both samples. If both samples
look approximately normal then it’s fine to proceed with the two-sample t-test, and if we
have a large sample we don’t need to worry too much about moderate departures from
normality.
Jones, 2019
What about the equal variance assumption?
It is sometimes said that when applying a two-sample t-test the variance (i.e. the dispersion)
of each sample must be the same, or at least quite similar. This would be true if we’re using
original version of Student’s two-sample t-test. However, R doesn’t use this version of the test
by default. R uses the “Welch” version of the two-sample t-test. The Welch two-sample t-test
does not rely on the equal variance assumption. As long as we stick with this version of the ttest, the equal variance assumption isn’t one we need to worry about.
Is there ever any reason not to use the Welch two-sample t-test? The alternative is to use the
original Student’s t-test. This version of the test is a little more powerful than Welch’s version,
in the sense that it is more likely to detect a difference in means. However, the increase in
statistical power is really quite small when the sample sizes of each group are similar, and the
original test is only correct when the population variances are identical. Since we can never
prove the ‘equal variance’ assumption—we can only ever reject it—it is generally safer to just
use the Welch two-sample t-test.
One last warning. Student’s two-sample t-test assumes the variances of the populations are
identical. It is the population variances, not the sample variances, that matter. There are
methods for comparing variances, and people sometimes suggest using these to select ‘the
right’ t-test. This is bad advice. For reasons just outlined, there’s little advantage to using
Student’s version of the test if the variances really are the same. What’s more, the process of
picking the test based on the results of another statistical test affects the reliability of the
resulting p-values.
Carrying out a two-sample t-test in R
We’ll work with the plant morph example again time to learn how to carry out a two-sample ttest in R. We’ll use the test to evaluate whether or not the mean dry weight of purple plants
is different from that of green plants.
Read the data in MORPH_DATA.CSV into an R data frame, giving it the name morph.data. N.B.
This should already be read in from the one sample t-test.
Next, we need to explore the data. Perform a histogram on the purple and green morph
weight data separately to test the normality assumption. You should already have the purple
data subsetted and produced a histogram of this. You have learnt how to do this in the data
visualisation workshop.
The sample sizes 173 (green plants) and 77 (purple plants). These are good sized samples, so
hopefully the normality assumption isn’t a big deal here. Nonetheless, we still need to check
the distributional assumptions. There is nothing too ‘non-normal’ about the two samples so
it’s reasonable to assume they both came from normally distributed populations.
Jones, 2019
Carrying out the test
The function we need to carry out a two-sample t-test in R is the t.test function, i.e. the same
one we used for the one-sample test.
Remember, morph.data has two columns: Weight contains the dry weight biomass of each
plant, and Colour is an index variable that indicates which sample (plant morph) an
observation belongs to. Here’s the code to carry out the two-sample t-test:
t.test(Weight ~ factor(Colour), data=morph.data)
We have suppressed the output for now so that we can focus on how to use the function. We
have to assign two arguments:
1. The first argument is a formula. We know this because it includes a ‘tilde’ symbol: ~.
The variable name on the left of the ~ must be the variable whose mean we want to
compare (Weight). The variable on the right must be the indicator variable that says
which group each observation belongs to (Colour).
2. The second argument is the name of the data frame that contains the variables listed
in the formula. This should be morph.data as we want to look at both the green and
purple morphs.
Let’s take a look at the output:
The first part of the output reminds us what we did. The first line reminds us what kind of ttest we used. This says: Welch two-sample t-test, so we know that we used the Welch version
of the test that accounts for the possibility of unequal variance. The next line reminds us about
the data. This says: data: Weight by Colour, which is R-speak for ‘we compared the means of
the Weight variable, where the sample membership is defined by the values of
the Colour variable’.
Jones, 2019
The third line of text is the most important. This says: t = -2.7808, d.f. = 140.69, p-value =
0.006165. The first part, t = -2.7808, is the test statistic (i.e. the value of the t-statistic). The
second part, df = 140.69, summarise the ‘degrees of freedom’ (see the box below). The third
part, p-value = 0.006165, is the all-important p-value. This says there is a statistically
significant difference in the mean dry weight biomass of the two colour morphs,
because p<0.05. As the p-value is less than 0.01 but greater than 0.001, we would report this
as ‘p < 0.01’.
The fourth line of text (alternative hypothesis: true difference in means is not equal to 0)
simply reminds us of the alternative to the null hypothesis (H1). We can ignore this.
The next two lines show us the ‘95% confidence interval’ for the difference between the
means. Just as with the one sample t-test we can think of this interval as a rough summary of
the likely values of the true difference (again, a confidence interval is more complicated than
that in reality).
The last few lines summarise the sample means of each group. This is useful as it shows the
means for each of the groups i.e. you will know which is greater than the other.
Jones, 2019
Summarising the result
Having obtained the result, we need to report it. We should go back to the original question
to do this. In our example the appropriate summary is:
Mean dry weight biomass of purple and green plants differs significantly (Welch’s t = 2.78,
d.f. = 140.7, p < 0.01), with purple plants being the larger.
This is a concise and unambiguous statement in response to our initial question. The
statement indicates not just the result of the statistical test, but also which of the mean values
is the larger. Always indicate which mean is the largest. It is sometimes appropriate to also
give the values of the means:
The mean dry weight biomass of purple plants (767 grams) is significantly greater than that
of green plants (708 grams) (Welch’s t = 2.78, d.f. = 140.7, p < 0.01)
When we are writing scientific reports, the end result of any statistical test should be a
statement like the one above—simply writing t = 2.78 or p < 0.01 is not an adequate
conclusion!
Jones, 2019
Paired t-test
When do we use a paired-sample t-test?
We learned before how to use a two-sample t-test to compare means among two
populations. However, there are situations in which data may naturally form pairs of nonindependent observations: the first value in a sample A is linked in some way to the first value
in sample B, the second value in sample A is linked with the second value in sample B, and so
on. This is known, unsurprisingly, as a paired-sample design. A common example of a pairedsample design is the situation where we have a set of organisms, and we record some
measurement from each organism before and after an experimental treatment. For example,
if we were studying heart rate in relation to position (sitting vs. standing) we might measure
the heart rate of a number of people in both positions. In this case the heart rate of a
particular person when sitting is paired with the heart rate of the same person when standing.
In biology, we often have the problem that there is a great deal of variation between the
items we’re studying (individual organisms, forest sites, etc). There may be so much amongitem variation that the effect of any difference among the situations we’re really interested
in is obscured. A paired-sample design gives us a way to control for this variation. However,
we should not use a two-sample t-test when our data have this kind of structure. Let’s find
out why.
Why do we use a paired-sample design?
Consider the following. A drug company wishes to test two drugs for their effectiveness in
treating a rare illness in which glycolipids are poorly metabolised. An effective drug is one that
lowers glycolipid concentrations in patients. The company is only able to find 8 patients willing
to cooperate in the early trials of the two drugs. What’s more, the 8 patients vary in their age,
sex, body weight, severity of symptoms and other health problems.
One way to conduct an experiment that evaluates the effect of the new drug is to randomly
assign the 8 patients to one or other drug and monitor their performance. However, this kind
of design is very unlikely to detect a statistically significant differences between the
treatments. This is because it provides very little replication, yet can we expect considerable
variability from one person to another in the levels of glycolipid before any treatment is
applied. This variability would to lead to a large standard error in the difference between
means.
A solution to this problem is to treat each patient with both drugs in turn and record the
glycolipid concentrations in the blood, for each patient, after a period taking each drug. One
arrangement would be for four patients to start with drug A and four with drug B, and then
after a suitable break from the treatments, they could be swapped over onto the other drug.
This would give us eight replicate observations on the effectiveness of each drug and we can
determine, for each patient, which drug is more effective.
Jones, 2019
The experimental design, and one hypothetical outcome, is represented in the diagram
below…
Figure 3: Data from glycolipid study, showing paired design. Each patient is denoted by a
unique number.
Each patient is represented by a unique number (1-8). The order of the drugs in the plot does
not matter—it doesn’t mean that Drug A was tested before Drug B just because Drug A
appears first. Notice that there is a lot of variability in these data, both in the glycolipid levels
of each patient, and in the amount by which the drugs differ in their effects (e.g. the drugs
have roughly equal effects for patient 5, while drug B appears to be more effective for patient
2). What can also be inferred from this pattern is that although the glycolipid levels vary a
good deal between patients, Drug B seems to reduce glycolipid levels more than Drug A.
The advantage to using a paired-sample design in this case is clear if we look at the results we
might have obtained on the same patients, but where they have been divided into two groups
of four, giving one group Drug A and one group Drug B:
Jones, 2019
Figure 4: Data from glycolipid study, ignoring paired design.
The patients and their glycolipid levels are identical to those in the previous diagram, but only
patients 2, 3, 4 and 8 (selected at random) were given Drug A, while only patients 1, 5, 6, and
7 were given Drug B. The means of the two groups are different, with Drug B performing
better, but the associated standard error would also be large relative to this difference. A
two-sample t-test would certainly fail to identify a significant difference between the two
drugs.
So, it would be quite possible to end up with two groups where there was no clear difference
in the mean glycolipid levels between the two drug treatments even though Drug B seems to
be more effective in the majority of patients. What the pairing is doing is allowing us to factor
out (i.e. remove) the variation among individuals, and concentrate on the differences
between the two treatments. The result is a much more sensitive evaluation of the effect
we’re interested in.
The next question is, how do we go about analysing paired data in a way that properly
accounts for the structure in the data?
How do you carry out a t-test on paired-samples?
It should be clear why a paired-sample design might be useful, but how do we actually
construct the right test? The ‘trick’ is to work directly with the differences between pairs of
values. In the case of the glycolipid levels illustrated in the first diagram, we noted that there
was a greater decrease of glycolipids in 75% of patients using Drug B compared with Drug A.
Jones, 2019
If we calculate the actual differences (i.e. subtracted the value for Drug A from the value for
Drug B) for each patient we might see something like…
-3.9 -4.3 2.5 -4.5 0.5 -3.9 -7.1 -2.6
Notice that there are only two positive values in this sample of differences, one of which is
fairly close to 0. The mean difference is -2.9, i.e. on average, glycolipid levels are lower with
Drug B. Another way of stating this observation is that within subjects (patients), the mean
difference between drug B and drug A is negative. A paired-sample design focusses on the
within-subject (or more generally, within-item) change.
If, on the other hand, the two drugs had had similar effects then what would we expect to
see? We would expect no consistent difference in glycolipid levels between the Drug A and
Drug B treatments. Glycolipid levels are unlikely to remain exactly the same over time, but
there shouldn’t be any pattern to these changes with respect to the drug treatment: some
patients will show increases, some decreases, and some no change at all. The mean of the
differences in this case should be somewhere around zero (though sampling variation will
ensure it isn’t exactly equal to zero).
So, to carry out a t-test on paired-sample data we have to: 1) find the mean of the difference
of all the pairs and 2) evaluate whether this is significantly different from zero. We already
know how to do this! This is just an application of the one-sample t-test, where the expected
value (i.e. the null hypothesis) is 0. The thing to realise here, is that although we started out
with two sets of values, what matters is the sample of differences between pairs and the
population we’re interested in a ‘population of differences’.
When used to analyse paired data in this way, the test is referred to as a paired-sample t-test.
This is not wrong, but it important to remember that a paired-sample t-test is just a onesample t-test applied to the sample of differences between pairs of associated observations.
A paired-sample t-test isn’t really a new kind of test. Instead, it is a one-sample t-test applied
to a new kind of situation.
Assumptions of the paired-sample t-test
The assumptions of a paired-sample t-test are no different from the one-sample t-test. After
all, they boil down to the same test! We just have to be aware of the target sample. The key
point to keep in mind is that it is the sample of differences that is important, not the original
data. There is no requirement for the original data to be drawn from a normal distribution
because the normality assumption applies to the differences. This is very useful, because even
where the original data seem to be drawn from a non-normal distribution, the differences
between pairs can often be acceptably normal. The differences do need to be measured on
an interval or ratio scale, but this is guaranteed if the original data are on one of these scales.
Jones, 2019
Carrying out a paired-sample t-test in R
R offers the option of a paired-sample t-test to save us the effort of calculating differences. It
calculates the differences between pairs for us and then carries out a one-sample test on
those differences. We’ll look at how to do it the ‘old fashioned’ way first—calculating the
differences ourselves and running a one-sample test—before using the short-cut method
provided by R.
Staying with the problem of trials of two drugs for controlling glycolipid levels, the serum
glycolipid concentration data from the trial illustrated above are stored in the
GLYCOLIPID.CSV file.
Download this file and place it in the working directory. Read GLYCOLIPID.csv into an R data
frame, giving it the name glycolipid.
As always, we should start by looking at the raw data by using:
summary(glycolipid)
There are four variables in this data set: Patient indexes the patient identity, Sex is the sex of
the patient (we don’t need this), Drug denotes the drug treatment, and Glycolipid is the
glycolipid level. Glycolipid_A is the glycolipid levels in response to drug A for patients 1-8.
Glycolipid_B is the glycolipid levels in response to drug B for patients 1-8.
Next, we need to calculate the differences between each pair. We can do this by creating a
new variable, as you did in the data visualisation workshop.
Create a new variable called glycolipid_diffs in the glycolipid dataframe by subtracting one
group from another using the glycoplipid_A and glycolipid_B columns.
What you did was calculate the difference between the two Glycolipid concentrations within
each patient. We stored the result of this calculation in a new data frame
called glycolipid_diffs. This is the data we’ll use to carry out the paired-sample t-test.
We should try to check that the differences could plausibly have been drawn from a normal
distribution, though normality is quite hard to assess with only 8 observations. Create a
histogram of the differences data.
The data seems roughly normal, so let’s carry out a one-sample t-test on the calculated
differences. This is test is easy to do in R:
t.test(glycolipid$glycolipid_diffs)
Jones, 2019
We don’t have to set the data argument to carry out a one-sample t-test on the differences.
We just passed along the numeric vector of differences extracted from glycolipid_diffs (using
the $operator). What happened to the mu argument used to set up the null hypothesis?
Remember, the null hypothesis is that the population mean is zero. R assumes that this is 0 if
we don’t supply it, so no need to set it here.
The output is quite familiar… The first line reminds us what kind of test we did, and the second
line reminds us what data we used to carry out the test. The third line is the important one: t
= -2.6209, df = 7, *p*-value = 0.03436. This gives the t-statistic, the degrees of freedom, and
the all-important p-value associated with the test. The p-value tells us that the mean withinindividual difference is significant at the p < 0.05 level.
We need to express these results in a clear sentence incorporating the relevant statistical
information:
Individual patients had significantly lower serum glycolipid concentrations when treated with
Drug B than when treated with Drug A (t = 2.62, d.f. = 7, p < 0.05).
There are a couple of things to point out in interpreting the result of such a test:
1. The sample of differences was used in the test, not the sample of paired observations.
This means the degrees of freedom for a paired-sample t test are one less than the
number of differences (= number of pairs); not one, or two, less than the total number
of observations.
2. Since we have used a paired-sample design our conclusion stresses the fact that the
use of the Drug B results in a lower glycolipid level in individual patients; it doesn’t say
that the use of Drug B resulted in lower glycolipid concentrations for everyone given
Drug B than for anyone given Drug A.
Jones, 2019
Using the paired = TRUE argument
R does have a built in procedure for doing paired-sample t-tests. Now that we’ve done it the
hard way, let’s try carrying out the test using the built in procedure. This looks very similar to
a two-sample t-test, except that we have to set the paired argument of the t.test function
to TRUE:
t.test(Glycolipid ~ Drug, data = glycolipid, paired = TRUE)
R takes care of the differencing for us, so now we can work with the original glycolipid data
rather than the glycolipid_diffs data frame constructed above. We won’t step through the
output because it should make sense by this point.
R makes it easy to do paired-sample t-test. It really doesn’t matter which method we use to
carry out the test. Just don’t forget that a paired-sample t-test is only a one-sample test on
paired differences.
Jones, 2019
Do it yourself!
The file spider.csv is on Canvas. The data here are from an experiment where researchers
removed a pedipalp from male spiders. In males these are enlarged and used in mating. They
hypothesised that the pedipalp was a sexual handicap to the male spiders They measured the
speed (in cm/s) of spiders before and after amputation.
Does amputation appear to have an effect on the speed of spiders?
Conduct the appropriate statistical test and provide a suitable figure.
earch article
The impact of e-service quality and customer satisfaction on customer
behavior in online shopping
Paulo Rita a,*, Tiago Oliveira a, Almira Farisa b
a Nova Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal
b ISCTE Business School (IBS), Instituto Universitario de Lisboa, Portugal
A R T I C L E I N F O
Keywords:
e-service quality
Customer satisfaction
Customer trust
Consumer behavior
Online shopping
Retailing
Business
Information science
Marketing
A B S T R A C T
The purpose of this study is to develop new knowledge to better understand the most important dimensions of eservice quality that have impact on customer satisfaction, customer trust, and customer behavior, building on
existing literature on e-service quality in online shopping. This study focuses on the four-dimensions of e-service
quality model that better predict customer behavior. It not only tests the impact of customer satisfaction on
customer behavior such as repurchase intention, word of mouth, and site revisit, but also the impact of customer
trust. The result is expected to extend the knowledge about different country culture vis- a-vis different relevance
of e-service quality attributes. Data from an online survey of 355 Indonesian online consumers was used to test the
research model using structural equation modelling. The analytical results showed that three dimensions of eservice quality, namely website design, security/privacy and fulfilment affect overall e-service quality. Meanwhile, customer service is not significantly related to overall e-service quality. Overall e-service quality is statistically significantly related to customer behavior. Future research should consider a variety of product segments
and/or other industries to make sure that the measurement works equally well. In other industry setting, the
measurement may need to be adjusted. Future research could also use different methodologies such as focus group
and interviews.
1. Introduction
The Internet has been generating consumer empowerment for over a
decade (Pires et al., 2006). Brick-and-mortar stores are slowly but surely
closing down because of the rise of e-commerce (Quora, 2017).
Compared with physical stores, online businesses offer convenience to
customers (Business.com, 2017). Customers can just sit at their home,
place their orders, pay via credit card, and wait until the goods are
delivered to their home. E-commerce in Indonesia is growing fast due to
the growth of internet penetration. In March 2017, internet penetration
reached slightly over 50% with 104.96 million internet users. The
number of Indonesian internet users is projected to reach 133.39 million
in 2021, making Indonesia one of the biggest online markets worldwide
(Statista, 2018b). According to Statista (2018a), Indonesia currently has
approximately 28.2 million online shoppers and is projected to experience a 3–4% annual increase for the next years. The majority of users are
in the 25-34-year old range and account for 12.8 million users who shop
online in Indonesia.
The rapid development of information technology led to a cultural
shift. Customers started shopping via e-commerce rather than in physical
stores. Physical businesses have been attempting to gain a competitive
advantage by using e-commerce to interact with customers (Lee and Lin,
2005). In online businesses, competition can easily enter the market
because of low entry barriers (Wang et al., 2016). From the customer
perspective, they have low switching costs to shop from one online store
to another (Mutum et al., 2014). In physical businesses and online
businesses, customer shopping experience influences future customer
behavior, including repurchase intention, store revisit intention, and
word of mouth (WOM) (Chang and Wang, 2011).
The biggest challenge for online shopping is to provide and maintain
customer satisfaction. A key success factor to survive in a fierce
competitive e-environment is a strategy that focuses on services. A
company must deliver superior service experiences to its customers, so
that they will repurchase and be loyal to the firm (Gounaris et al., 2010).
In order to obtain high levels of customer satisfaction, high service
quality is needed, which often leads to favorable behavioral intentions
(Brady and Robertson, 2001). A website with good system quality, information quality, and electronic service quality is a key to success in
* Corresponding author.
E-mail address: prita@novaims.unl.pt (P. Rita).
Contents lists available at ScienceDirect
Heliyon
journal homepage: www.heliyon.com
https://doi.org/10.1016/j.heliyon.2019.e02690
Received 8 October 2018; Received in revised form 5 August 2019; Accepted 15 October 2019
2405-8440/© 2019 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Heliyon 5 (2019) e02690
e-commerce (Sharma and Lijuan, 2015).
Many researchers have studied the concept of e-service quality. The
attributes of e-service quality have a significant association with overall
e-service quality, customer satisfaction, and repurchase intentions, but
not with WOM (Blut et al., 2015). Moreover, Tsao et al. (2016) studied
the impact of e-service quality on online loyalty based on online shopping
experience in Taiwan and showed that system quality and electronic
service quality had significant effects on perceived value, that in turn had
a significant influence on online loyalty. In addition, Gounaris et al.
(2010) found that e-service quality had a positive impact on three consumer behavior intentions: purchase intentions, site revisit, and WOM.
Blut (2016) demonstrated that e-service quality had a positive effect on
customer satisfaction, repurchase intention, and WOM for online shoppers in the U.S. Thus, in general, the existing studies about e-service
quality have differences in both methodology and results, with no definite conclusions (Gounaris et al., 2010).
Chang et al. (2013) stated that trust is the most important factor to
attract e-commerce buyers. However, only few studies about the impact
of service quality on trust, especially within the scope of online business
are available. Rasheed and Abadi (2014) tested the impact of e-service
quality on trust in the overall services industry and found that trust was
considered to be an antecedent of service quality. Furthermore, Saleem
et al. (2017) tested it on the Pakistani airline industry and determined
that trust plays a vital role in driving repurchase intention for all services
business.
Using an incorrectly specified e-service quality model would overestimate the importance of e-service quality attributes (Blut et al., 2015).
In addition, Blut et al. (2015) developed a hierarchical model of e-service
quality that was able to predict customer behavior better than other
established instruments, but only Blut (2016) empirically tested the
conceptual model for online shoppers in the U.S. So as to address the
research gap mentioned above, this study empirically tested Blut et al.
(2015) e-service quality model in order to understand the impact of
e-service quality not only in customer satisfaction, purchase intention
and WOM, but also in customer trust and site revisit.
Country culture was found to affect the relevance of the e-service
quality construct (Blut et al., 2015). Thus, this research empirically tested
the hierarchical model of e-service quality measurement in a new cultural
setting, Indonesia, to see whether it works equally well in different
countries and cultures. Cultural differences in online shopping behavior
may also influence the prioritization of e-service quality attributes, but
this has not yet been investigated (Brusch et al., 2019).
The goals of this research are as follows: (1) to test the hierarchical
model of e-service quality in a new cultural setting, and (2) to make a
parallel comparison of e-service quality perception between two different
cultural settings, Indonesia and the USA.
2. Background
Many researchers have proposed different attributes and dimensions
to measure e-service quality. Dabholkar (1996) conducted an early study
about e-service quality which examined how customers form expectations on technology based self-service quality and suggested five main
attributes of e-service quality: speed of delivery, ease of use, reliability,
enjoyment, and control. The result of the study shows that control and
enjoyment were significant determinants of service quality, ease of use
was also a key determinant in service quality, but only for high waiting
time and control groups, while speed of delivery and reliability had no
impact on service quality.
The most common approach to measure service quality is the
SERVQUAL model (Parasuraman et al., 1985). This model is still popular
and currently used in many studies (Alrubaiee & Alkaa’ida, 2011; Kansra
and Jha, 2016; Kitapci et al., 2014). In the online business context, many
researchers modified SERVQUAL into several models. The most
well-known adapted models are WebQual developed by Barnes and
Vidgen (2002) and Loiacono et al. (2002), eTailQ conceived by Wolfinbarger and Gilly (2003), E-S-Qual draughted by Parasuraman et al.
(2005), and the latest hierarchical model of e-service quality proposed by
Fig. 1. Conceptual model.
P. Rita et al. Heliyon 5 (2019) e02690
2
Table 1
Measurement of latent constructs.
Constructs Items Source
Website
Design
Information Quality IQ1. The information on the website is pretty much what I need to carry out my tasks.
IQ2. The website adequately meets my information needs.
IQ3. The information on the website is effective.
Blut (2016); Holloway and Beatty (2008)
Website Aesthetics WA1. The website is visually pleasing.
WA2. The website displays a visually pleasing design.
WA3. The website is visually appealing.
Blut (2016); Holloway and Beatty (2008)
Purchase Process PP1. The website has no difficulties with making a payment online
PP2. The purchasing process was not difficult.
PP3. It is easier to use the website to complete my business with the company than it is to use a
telephone or fax or mail a representative.
Blut (2016); Holloway and Beatty (2008)
Website Convenience WC1. The website displays a visually pleasing easy to read content.
WC2. The text on the website is easy to read.
WC3. The website labels are easy to understand.
Blut (2016); Holloway and Beatty (2008)
Product Selection PS1. All my business with the company can be completed via the website.
PS2. This website has a good selection.
PS3. The site has a wide variety of products that interest me.
Blut (2016); Holloway and Beatty (2008)
Price Offerings PO1. The website offers discounts or free shipping.
PO2. The website has low prices.
PO3. The website has lower prices than offline stores.
Blut (2016); Holloway and Beatty (2008)
Website
Personalization
WP1. The website allows me to interact with it to receive tailored information.
WP2. The website has interactive features, which help me accomplish my task.
WP3. I can interact with the website in order to get information tailored to my specific needs.
Blut (2016); Holloway and Beatty (2008)
System Availability SA1. When I use the website, there is very little waiting time between my actions and the
website’s response.
SA2. The website loads quickly.
SA3. The website takes a long time to load. (R)
Blut (2016); Holloway and Beatty (2008)
Customer
Service
Service Level SL1. The online shop provides a telephone number to reach the company.
SL2. The online shop has customer service representatives available online.
SL3. The online shop offers the ability to speak to a live person if there is a problem.
Blut (2016); Holloway and Beatty (2008)
Return Handling/
Policies
RP1. The online shop provides me with convenient options for returning items.
RP2. The online shop handles product returns well.
RP3. The online shop offers a meaningful guarantee.
Blut (2016); Holloway and Beatty (2008)
Security/
Privacy
Security SC1. I feel safe in my transactions with the online shop.
SC2. The online shop has adequate security features.
SC3. This site protects information about my credit card.
Blut (2016); Holloway and Beatty (2008)
Privacy PR1. I trust the online shop to keep my personal information safe.
PR2. I trust the website administrators will not misuse my personal information.
PR3. It protects information about my web-shopping behavior.
Blut (2016); Holloway and Beatty (2008)
Fulfillment Timeliness of Delivery TD1. The product is delivered by the time promised by the company.
TD2. This online shop website makes items available for delivery within a suitable time frame.
TD3. It quickly delivers what I order.
Blut (2016); Holloway and Beatty (2008)
Order Accuracy OA1. You get what you ordered from this website.
OA2. The website sends out the items ordered.
OA3. The website is truthful about its offerings.
Blut (2016); Holloway and Beatty (2008)
Delivery Condition DC1. The product was damaged during delivery. (R)
DC2. The ordered products arrived in good condition.
DC3. The products arrived with major damage. (R)
Blut (2016); Holloway and Beatty (2008)
Overall e-Service Quality SQ1. Overall. my purchase experience with this online shop is excellent
SQ2. The overall quality of the service provided by this online shop is excellent
SQ3. My overall feelings toward this online shop are very satisfied
Blut (2016)
Customer Satisfaction S1. I am satisfied with this online shop.
S2. The online shop is getting close to the ideal online retailer.
S3. The online shop always meets my needs.
Fornell (1992)
Customer Trust T1. One can expect good advice from this online shop.*
T2. This online shop is genuinely interested in customer’s welfare.
T3. If problems arise, one can expect to be treated fairly by this online shop.
T4. I am happy with the standards by which this online shop is operating.
T5. This online shop operates scrupulously.
T6. You can believe the statements of this online shop.
Gefen (2002); Lee and Turban (2001);
Urban et al. (2009)
Repurchase Intention RI1. I will make more purchases through this online shop in the future.
RI2. I will increase purchases through this online shop.
RI3. I will intensify purchases through this online shop.
Zeithaml et al. (1996)
Word of Mouth WOM1. I say positive things about this online shop to other people.
WOM2. I recommend this online shop to anyone who seeks my advice.
WOM3. I encourage friends and others to purchase goods from this online shop.
Zeithaml et al. (1996)
Site Revisit SR1. I will not to shop again from this online shop. (R)*
SR2. I will make my next purchase from this online shop.
SR3. I will re-visit this online shop in the future.
Gounaris et al. (2010)
Note: * items have been excluded due to low validity.
P. Rita et al. Heliyon 5 (2019) e02690
3
Blut et al. (2015).
Loiacono et al. (2002) developed the WebQual scale to analyze
websites selling books, music, airline tickets, and hotel reservations.
The dimensions of WebQual are informational fit to task, interactivity, trust, response time, ease of understanding, intuitive operations, visual appeal, innovativeness, flow (emotional appeal),
consistent image, on-line completeness, and better than alternative
channels. The study provides researchers with a validated, reliable
measure of website quality. It also adds to the understanding of TAM by
revealing the components of ease of use and usefulness.
Later, Barnes & Vidgen (2002) also pioneered a new e-service
quality measurement called WebQual that focused on the importance of
easy-to-use websites. The WebQual measurement consists of five attributes: user-friendliness, design, information, trust, and empathy. The
measurement has metamorphosed several times up to WebQual 4.0.
Other research conducted by Wolfinbarger and Gilly (2003) used
focus groups to develop eTailQ, an e-service quality model that consists
of a list of attributes categorized in four dimensions: customer service,
privacy/security, website design, and fulfillment/reliability. Pan,
Ratchford and Shankar (2002) analyzed 105 online retailers
comprising 6,739 price observations for 581 items in eight product
categories and proposed five dimensions of e-service quality: reliability, shopping convenience, product information, shipping/handling, and pricing.
Zeithaml et al. (2002) assembled what is currently known about
service quality delivery through websites on five main dimensions:
information availability and content, ease of use, privacy/security,
graphic style, and fulfillment/reliability. A study conducted by Parasuraman et al. (2005) divided e-service quality into two different scales:
the e-service quality scale (E-S-QUAL) and e-service quality recovery
scale (E-RecS-QUAL). Privacy/security, reliability, fulfillment, efficiency, and individualized attention are the dimensions of E-S-QUAL
where the dimensions of E-RecS-QUAL are responsiveness, compensation, and contact. The results of the study show that privacy plays a
significant role in customers’ higher-order evaluations pertaining to
websites.
Gounaris et al. (2010) examined the effect of service quality and
satisfaction on WOM, site revisits, and purchase intention in the context
of internet shopping. These authors used the WebQual scale (usability,
information, and interaction) developed by Barnes and Vidgen (2002)
and two additional parameters, aesthetics and after-sales service,
developed by Lee and Lin (2005) to measure e-service quality. The
study used 240 random online interviews from an Internet provider in
Greece and showed that e-service quality had a positive effect on
satisfaction, while it also influenced the customer behavioral intentions, namely site revisits, WOM communication and repeat purchase, both directly and indirectly through satisfaction.
Kitapci et al. (2014) investigated the effect of service quality dimensions on patient satisfaction, identified the effect of satisfaction on
WOM communication and repurchase intention, and looked for a significant relationship between WOM and repurchase intention in the
public healthcare industry. The framework used the SERVQUAL model
developed by Parasuraman et al. (1985) to measure service quality. The
study demonstrated that customer satisfaction had a significant effect
on WOM and repurchase intentions which were observed as highly
related.
The existing measurement of e-service quality in online business has
some weaknesses. According to Blut (2016), E-S-Qual and eTailQ
measurements lack criteria to assess online stores so they cannot suitably explain customer dissatisfaction and their switching to other online stores. The other weakness lies in the ability to predict customer
behavior. Though it covers 13 of 16 attributes of e-service quality,
eTailQ only ranks eighth in its predictive ability and does not perform
well to measure customer service and security (Blut et al., 2015).
WebQual might come first in the ability to predict customer behavior,
but it only has a narrow focus.
Table 2
Cronbach’s alpha, composite reliability (CR), AVE, and Fornell-Larcker Criterion.
Cronbach’s Alpha CR AVE IQ WA PP WC PS PO WP SA SL RP SC PR TD OA DC SQ S T RI WOM SR
IQ 0.868 0.919 0.792 0.890
WA 0.887 0.930 0.815 0.615 0.903
PP 0.782 0.874 0.698 0.526 0.412 0.836
WC 0.892 0.933 0.823 0.611 0.692 0.598 0.907
PS 0.816 0.892 0.734 0.583 0.549 0.610 0.709 0.857
PO 0.780 0.873 0.696 0.488 0.390 0.266 0.325 0.343 0.834
WP 0.834 0.901 0.752 0.471 0.315 0.274 0.291 0.318 0.638 0.867
SA 0.770 0.867 0.686 0.471 0.396 0.277 0.320 0.298 0.574 0.544 0.828
SL 0.774 0.869 0.689 0.405 0.266 0.128 0.238 0.283 0.395 0.588 0.393 0.830
RP 0.876 0.924 0.802 0.419 0.292 0.214 0.242 0.308 0.442 0.510 0.386 0.615 0.895
SC 0.837 0.903 0.758 0.463 0.293 0.222 0.275 0.270 0.470 0.531 0.543 0.430 0.621 0.871
PR 0.895 0.935 0.827 0.448 0.299 0.205 0.264 0.296 0.416 0.444 0.421 0.418 0.545 0.701 0.910
TD 0.896 0.935 0.828 0.487 0.348 0.260 0.288 0.309 0.527 0.586 0.651 0.523 0.439 0.503 0.487 0.910
OA 0.876 0.924 0.802 0.475 0.354 0.245 0.318 0.340 0.572 0.594 0.634 0.443 0.514 0.646 0.584 0.779 0.896
DC 0.734 0.842 0.641 0.425 0.282 0.121 0.203 0.220 0.419 0.434 0.489 0.347 0.368 0.491 0.367 0.510 0.621 0.800
SQ 0.915 0.946 0.855 0.554 0.398 0.294 0.355 0.374 0.561 0.609 0.595 0.385 0.507 0.655 0.516 0.677 0.750 0.555 0.925
S 0.855 0.911 0.774 0.522 0.341 0.285 0.337 0.328 0.584 0.607 0.590 0.379 0.551 0.691 0.597 0.609 0.749 0.543 0.791 0.880
TD 0.908 0.931 0.731 0.482 0.327 0.241 0.289 0.331 0.535 0.679 0.615 0.498 0.532 0.632 0.625 0.681 0.730 0.511 0.719 0.795 0.855
RI 0.914 0.946 0.853 0.384 0.287 0.219 0.241 0.258 0.473 0.528 0.534 0.350 0.420 0.457 0.375 0.501 0.579 0.419 0.619 0.722 0.696 0.924
WOM 0.931 0.956 0.880 0.523 0.326 0.261 0.321 0.362 0.564 0.593 0.528 0.369 0.478 0.593 0.512 0.516 0.679 0.491 0.713 0.780 0.755 0.803 0.938
SR 0.849 0.930 0.869 0.390 0.225 0.164 0.220 0.238 0.465 0.565 0.539 0.422 0.522 0.547 0.429 0.510 0.603 0.519 0.682 0.723 0.705 0.768 0.750 0.932
Notes: IQ: Information Quality; WA: Website Aesthetics; PP: Purchase Process; WC: Website Convenience; PS: Product Selection; PO: Price Offerings; WP: Website Personalization; SA: System Availability; SL: Service Level; RP: Returns Handling/
Policies; SC: Security; PR: Privacy; TD: Timeliness of Delivery; OA: Order Accuracy; DC: Delivery Condition; SQ: Overall Service Quality; S: Customer Satisfaction; T: Customer Trust; RI: Repurchase Intention; WOM: Word of Mouth; SR: Site Revisit.
*The numbers in diagonal (in bold) are the squared root of AVEs.
P. Rita et al. Heliyon 5 (2019) e02690
4
Table 3
Cross-loadings.
IQ WA PP WC PS PO WP SA SL RP SC PR TD OA DC SQ S T RI WOM SR
IQ1 0.919 0.546 0.489 0.596 0.554 0.501 0.536 0.462 0.471 0.421 0.468 0.495 0.496 0.503 0.442 0.593 0.542 0.526 0.380 0.565 0.436
IQ2 0.906 0.520 0.462 0.482 0.536 0.420 0.424 0.452 0.377 0.385 0.437 0.376 0.441 0.423 0.404 0.478 0.451 0.419 0.344 0.457 0.350
IQ3 0.842 0.583 0.453 0.551 0.462 0.373 0.279 0.336 0.214 0.305 0.321 0.310 0.355 0.330 0.278 0.392 0.389 0.328 0.296 0.359 0.240
WA1 0.547 0.877 0.410 0.661 0.545 0.328 0.266 0.333 0.235 0.258 0.222 0.255 0.304 0.332 0.270 0.341 0.301 0.278 0.260 0.298 0.208
WA2 0.522 0.921 0.310 0.594 0.426 0.354 0.288 0.351 0.246 0.263 0.276 0.261 0.303 0.301 0.247 0.352 0.298 0.293 0.216 0.233 0.160
WA3 0.594 0.910 0.391 0.618 0.512 0.375 0.299 0.388 0.238 0.270 0.294 0.292 0.335 0.324 0.248 0.384 0.325 0.315 0.299 0.347 0.238
PP1 0.411 0.302 0.861 0.466 0.506 0.220 0.251 0.235 0.106 0.196 0.196 0.186 0.257 0.243 0.100 0.254 0.249 0.212 0.171 0.191 0.111
PP2 0.448 0.398 0.868 0.519 0.510 0.215 0.205 0.255 0.083 0.182 0.178 0.164 0.214 0.184 0.102 0.234 0.232 0.197 0.209 0.200 0.145
PP3 0.458 0.327 0.775 0.512 0.512 0.232 0.231 0.201 0.132 0.159 0.182 0.164 0.181 0.189 0.100 0.250 0.233 0.196 0.168 0.264 0.155
WC1 0.579 0.740 0.502 0.908 0.597 0.355 0.297 0.294 0.189 0.246 0.283 0.266 0.281 0.299 0.181 0.373 0.353 0.286 0.249 0.313 0.210
WC2 0.535 0.593 0.590 0.931 0.639 0.262 0.250 0.260 0.208 0.198 0.226 0.229 0.247 0.270 0.165 0.307 0.294 0.244 0.207 0.275 0.189
WC3 0.546 0.543 0.539 0.882 0.695 0.263 0.244 0.319 0.251 0.212 0.238 0.220 0.254 0.295 0.207 0.282 0.268 0.253 0.198 0.285 0.200
PS1 0.417 0.461 0.438 0.522 0.750 0.181 0.183 0.222 0.207 0.219 0.221 0.262 0.210 0.244 0.155 0.251 0.201 0.263 0.172 0.230 0.134
PS2 0.540 0.488 0.552 0.631 0.902 0.347 0.286 0.263 0.259 0.306 0.237 0.262 0.297 0.314 0.213 0.366 0.332 0.277 0.250 0.332 0.209
PS3 0.532 0.467 0.567 0.659 0.908 0.335 0.332 0.277 0.257 0.263 0.236 0.244 0.279 0.310 0.194 0.334 0.297 0.310 0.235 0.356 0.256
PO1 0.382 0.335 0.239 0.273 0.254 0.779 0.436 0.455 0.324 0.311 0.316 0.240 0.419 0.422 0.330 0.335 0.402 0.364 0.340 0.333 0.301
PO2 0.406 0.315 0.220 0.290 0.284 0.876 0.549 0.484 0.359 0.350 0.357 0.345 0.448 0.474 0.321 0.511 0.462 0.428 0.367 0.463 0.381
PO3 0.432 0.329 0.209 0.252 0.318 0.845 0.603 0.497 0.308 0.439 0.495 0.446 0.452 0.531 0.396 0.545 0.589 0.539 0.471 0.602 0.475
WP1 0.433 0.308 0.239 0.235 0.294 0.581 0.817 0.454 0.450 0.428 0.515 0.454 0.487 0.563 0.466 0.570 0.546 0.569 0.493 0.582 0.487
WP2 0.398 0.218 0.198 0.226 0.240 0.563 0.904 0.476 0.516 0.425 0.422 0.332 0.533 0.497 0.378 0.478 0.522 0.613 0.437 0.483 0.488
WP3 0.394 0.290 0.272 0.295 0.290 0.515 0.878 0.483 0.563 0.471 0.440 0.365 0.504 0.484 0.285 0.534 0.507 0.583 0.441 0.475 0.493
SA1 0.314 0.250 0.173 0.186 0.149 0.447 0.446 0.781 0.350 0.365 0.432 0.311 0.487 0.488 0.233 0.428 0.454 0.497 0.424 0.362 0.409
SA2 0.415 0.331 0.256 0.287 0.268 0.536 0.511 0.903 0.338 0.351 0.518 0.336 0.560 0.552 0.401 0.539 0.547 0.510 0.488 0.468 0.475
SA3 0.430 0.393 0.249 0.310 0.306 0.440 0.395 0.796 0.294 0.252 0.396 0.395 0.564 0.531 0.554 0.502 0.460 0.524 0.414 0.470 0.451
SL1 0.330 0.247 0.039 0.175 0.273 0.251 0.435 0.286 0.779 0.448 0.332 0.316 0.367 0.344 0.283 0.306 0.287 0.359 0.268 0.304 0.324
SL2 0.353 0.258 0.150 0.233 0.243 0.320 0.462 0.311 0.863 0.496 0.368 0.380 0.496 0.379 0.338 0.320 0.314 0.441 0.268 0.285 0.379
SL3 0.328 0.163 0.122 0.183 0.194 0.403 0.561 0.375 0.846 0.578 0.370 0.344 0.435 0.378 0.246 0.331 0.340 0.435 0.331 0.329 0.348
RP1 0.409 0.294 0.233 0.238 0.315 0.432 0.468 0.323 0.573 0.908 0.514 0.466 0.376 0.449 0.337 0.421 0.506 0.471 0.379 0.422 0.447
RP2 0.358 0.259 0.209 0.189 0.257 0.389 0.465 0.302 0.551 0.923 0.581 0.536 0.389 0.449 0.269 0.452 0.483 0.445 0.391 0.420 0.444
RP3 0.359 0.232 0.130 0.223 0.255 0.366 0.437 0.417 0.525 0.853 0.576 0.461 0.416 0.483 0.387 0.491 0.491 0.516 0.359 0.444 0.517
SC1 0.418 0.246 0.175 0.244 0.230 0.401 0.513 0.548 0.405 0.502 0.903 0.568 0.517 0.647 0.496 0.681 0.670 0.628 0.452 0.600 0.579
SC2 0.391 0.262 0.182 0.230 0.217 0.443 0.493 0.477 0.363 0.589 0.913 0.587 0.469 0.594 0.443 0.636 0.630 0.584 0.409 0.544 0.493
SC3 0.399 0.255 0.223 0.244 0.256 0.381 0.377 0.390 0.354 0.529 0.790 0.676 0.323 0.441 0.340 0.387 0.500 0.434 0.329 0.400 0.353
PR1 0.441 0.295 0.196 0.270 0.286 0.447 0.422 0.371 0.386 0.529 0.727 0.921 0.443 0.533 0.357 0.502 0.574 0.581 0.339 0.483 0.405
PR2 0.422 0.280 0.195 0.271 0.284 0.375 0.388 0.411 0.375 0.511 0.637 0.947 0.462 0.554 0.334 0.479 0.560 0.548 0.317 0.455 0.387
PR3 0.352 0.237 0.166 0.170 0.235 0.304 0.402 0.366 0.383 0.443 0.537 0.859 0.423 0.508 0.307 0.424 0.491 0.580 0.372 0.460 0.380
TD1 0.448 0.300 0.239 0.288 0.300 0.473 0.503 0.513 0.501 0.322 0.326 0.367 0.880 0.610 0.419 0.564 0.484 0.557 0.390 0.389 0.351
TD2 0.430 0.311 0.229 0.253 0.255 0.478 0.539 0.629 0.461 0.420 0.517 0.433 0.923 0.758 0.483 0.622 0.591 0.632 0.482 0.519 0.512
TD3 0.454 0.339 0.242 0.249 0.290 0.489 0.556 0.628 0.469 0.448 0.517 0.521 0.925 0.749 0.487 0.657 0.582 0.664 0.491 0.492 0.517
OA1 0.399 0.295 0.181 0.276 0.275 0.478 0.494 0.574 0.367 0.420 0.629 0.538 0.678 0.918 0.616 0.670 0.681 0.603 0.463 0.552 0.557
OA2 0.449 0.354 0.228 0.305 0.345 0.540 0.565 0.579 0.406 0.508 0.589 0.505 0.744 0.925 0.611 0.730 0.663 0.635 0.473 0.596 0.541
OA3 0.430 0.299 0.254 0.272 0.292 0.522 0.540 0.550 0.421 0.451 0.513 0.532 0.669 0.841 0.428 0.610 0.671 0.734 0.635 0.688 0.524
DC1 0.338 0.222 0.099 0.182 0.175 0.204 0.250 0.326 0.195 0.260 0.299 0.191 0.278 0.349 0.813 0.312 0.330 0.297 0.261 0.302 0.354
DC2 0.368 0.256 0.130 0.168 0.206 0.484 0.517 0.493 0.379 0.390 0.541 0.461 0.573 0.683 0.820 0.586 0.591 0.594 0.492 0.549 0.541
DC3 0.303 0.185 0.042 0.138 0.134 0.237 0.181 0.301 0.204 0.180 0.257 0.135 0.285 0.356 0.766 0.357 0.297 0.235 0.165 0.242 0.281
SQ1 0.538 0.389 0.291 0.330 0.383 0.537 0.561 0.514 0.393 0.507 0.636 0.486 0.585 0.672 0.545 0.928 0.735 0.643 0.563 0.656 0.635
SQ2 0.489 0.346 0.289 0.343 0.348 0.503 0.557 0.522 0.345 0.467 0.558 0.454 0.618 0.711 0.458 0.925 0.709 0.660 0.570 0.673 0.626
SQ3 0.509 0.369 0.237 0.312 0.307 0.515 0.572 0.613 0.330 0.433 0.622 0.492 0.673 0.697 0.534 0.921 0.748 0.691 0.582 0.650 0.630
S1 0.500 0.363 0.248 0.324 0.316 0.551 0.580 0.578 0.461 0.546 0.655 0.542 0.651 0.796 0.647 0.819 0.912 0.750 0.701 0.751 0.747
S2 0.447 0.238 0.204 0.266 0.245 0.538 0.478 0.517 0.288 0.483 0.640 0.529 0.472 0.578 0.392 0.644 0.880 0.648 0.573 0.668 0.571
S3 0.425 0.287 0.303 0.297 0.300 0.445 0.536 0.453 0.224 0.415 0.520 0.506 0.462 0.574 0.357 0.599 0.846 0.694 0.621 0.629 0.570
T2 0.337 0.210 0.229 0.195 0.249 0.466 0.564 0.553 0.352 0.422 0.444 0.410 0.554 0.537 0.340 0.590 0.664 0.831 0.664 0.643 0.619
T3 0.433 0.299 0.201 0.277 0.293 0.463 0.626 0.524 0.490 0.449 0.520 0.539 0.622 0.662 0.438 0.609 0.683 0.887 0.560 0.621 0.559
T4 0.422 0.333 0.222 0.300 0.296 0.533 0.630 0.607 0.462 0.482 0.626 0.579 0.637 0.723 0.518 0.678 0.769 0.899 0.611 0.670 0.623
T5 0.453 0.277 0.234 0.228 0.291 0.418 0.565 0.421 0.488 0.508 0.502 0.522 0.516 0.546 0.465 0.553 0.614 0.805 0.515 0.575 0.598
(continued on next page)
P. Rita et al. Heliyon 5 (2019) e02690
5
Looking at the weaknesses of current e-service quality measurements, Blut et al. (2015) developed a hierarchical model using
meta-analysis. The hierarchical model offers a more comprehensive
model to capture attributes of online stores. Results show that e-service
quality is a four-dimensional construct: website design, customer service, security/privacy, and fulfillment. The hierarchical model also has
a higher predictive ability of consumer behavior than other existing
measurements.
Later, Blut (2016) empirically tested the Blut et al. (2015) model
using 358 U.S. online customers. The study showed that the e-service
quality construct conformed to the structure of a higher-order factor
model that links online service quality perceptions to distinct and
actionable dimensions, including website design, fulfillment, customer
service, and security/privacy. The results of this study also demonstrated that overall quality fully mediated the relationship between
dimensions and outcomes for fulfillment and security, and partially
mediated the relationships for customer service and website design.
From the above literature review, the authors decided that this
research should use the hierarchical model to examine the e-service
quality of online business. In addition, this research also investigates
the outcome of e-service quality to achieve positive consumer behavior
such as repurchase intention, WOM, and site revisit intention. As the
literature shows, these aspects are influenced by satisfaction, trust, and
several quality factors toward online store websites.
Fig. 1 illustrates the conceptual model for e-service quality in an
online shopping context. We adapted the models from Gounaris et al.
(2010), Blut (2016), Rasheed and Abadi (2014) and Kitapci et al.
(2014) to examine the relationship among customer satisfaction,
customer trust, repurchase intention, WOM, and site revisit.
According to Blut (2016), e-service quality measurements contain
four dimensions: website design, customer service, security/privacy,
and fulfillment. Website design refers to all elements of the customer
experience related to the website, including information quality, website aesthetics, purchase process, website convenience, product selection, price offerings, website personalization, and system availability.
An efficient website should contain three main content categories:
information-oriented, transaction-oriented, and customer-oriented
(Cox and Koelzer, 2004). A good website design should emphasize usability by providing the aesthetics of the design, reflecting a strong and
associative image to the brand, and being able to attract customers to
visit it (Díaz and Koutra, 2013). Customers assess their experience of
using a website to assess an online store’s overall service quality. Hence
we posit.
H1. Website design has a positive association with overall e-service
quality
Customer service refers to service level and returns handling/return
policies during and after the sale (Blut, 2016). Offline businesses always
have service staff that help customers during the purchasing process. In
online businesses, customers sometimes do the entire purchasing process by themselves without customer service assistance (McLean and
Wilson, 2016). Some online businesses provide customer service that
allows customers to ask for more detailed information regarding the
product they want to buy. Companies usually use web-based synchronous media such as live chat facilities, an online help desk, and social
network websites (Turel and Connelly, 2013). According to Blut (2016),
customer service might contribute to e-service quality. Hence.
H2. Customer service has a positive association with overall e-service
quality
Security/privacy refers to the security of credit card payments and
privacy of shared information (Blut, 2016). The website must emphasize assurance and security to increase the website credibility and service quality (Wang et al., 2015). Schmidt et al. (2008) showed that an
effective website must feature privacy and security (see also: Fortes and
Rita, 2016). When a customer purchases goods from an online website,
this requires entering private information such as name, address, and
Table 3 (continued )
IQ WA PP WC PS PO WP SA SL RP SC PR TD OA DC SQ S T RI WOM SR
T6 0.424 0.280 0.152 0.234 0.287 0.402 0.520 0.512 0.351 0.422 0.601 0.617 0.577 0.641 0.428 0.636 0.659 0.850 0.613 0.707 0.611
RI1 0.379 0.270 0.200 0.221 0.262 0.457 0.458 0.439 0.247 0.368 0.451 0.419 0.439 0.558 0.414 0.611 0.685 0.645 0.914 0.779 0.725
RI2 0.303 0.240 0.185 0.172 0.198 0.457 0.503 0.557 0.326 0.364 0.422 0.325 0.510 0.572 0.378 0.574 0.673 0.654 0.935 0.705 0.730
RI3 0.385 0.287 0.224 0.277 0.257 0.393 0.501 0.484 0.399 0.435 0.390 0.292 0.439 0.471 0.367 0.527 0.641 0.629 0.922 0.740 0.672
WOM1 0.537 0.334 0.261 0.324 0.341 0.561 0.573 0.518 0.394 0.485 0.625 0.503 0.523 0.681 0.482 0.705 0.773 0.747 0.762 0.951 0.741
WOM2 0.495 0.281 0.239 0.297 0.349 0.552 0.594 0.449 0.371 0.452 0.512 0.455 0.472 0.645 0.475 0.678 0.748 0.730 0.748 0.952 0.714
WOM3 0.434 0.302 0.233 0.280 0.329 0.468 0.497 0.521 0.263 0.404 0.529 0.483 0.453 0.579 0.421 0.620 0.667 0.641 0.751 0.910 0.651
SR2 0.321 0.161 0.101 0.149 0.163 0.416 0.559 0.520 0.369 0.507 0.524 0.371 0.478 0.562 0.426 0.634 0.696 0.672 0.749 0.705 0.937
SR3 0.409 0.261 0.209 0.266 0.285 0.452 0.492 0.485 0.420 0.466 0.496 0.431 0.472 0.562 0.545 0.637 0.651 0.641 0.681 0.694 0.927
Notes: IQ: Information Quality; WA: Website Aesthetics; PP: Purchase Process; WC: Website Convenience; PS: Product Selection; PO: Price Offerings; WP: Website Personalization; SA: System Availability; SL: Service Level;
RP: Returns Handling/Policies; SC: Security; PR: Privacy; TD: Timeliness of Delivery; OA: Order Accuracy; DC: Delivery Condition; SQ: Overall Service Quality; S: Customer Satisfaction; T: Customer Trust; RI: Repurchase
Intention; WOM: Word of Mouth; SR: Site Revisit. Bold value signifies above 0.7.
P. Rita et al. Heliyon 5 (2019) e02690
6
contact number, including credit card information (Holloway and Beatty,
2008). Customers are always concerned whether the website would
protect them against fraud after a transaction. Website security and privacy are important to assess the service quality of online stores. Hence.
H3. Security/privacy has a positive association with overall e-service
quality
Fulfillment refers to activities that ensure customers receive what
they ordered, including the time of delivery, order accuracy, and delivery
condition (Blut, 2016). This attribute can only be assessed after the
payment is made. According to Liao and Keng (2013), customer
post-payment dissonance is more likely to occur in online shopping
rather than in an offline shopping environment because customers
cannot see the product directly before they purchase it. Companies must
ensure delivery timeliness, order accuracy, and delivery conditions to
provide superior service quality for customers. Order fulfillment represents one of the determinants of e-service quality. Hence.
H4. Fulfillment has a positive association with overall e-service quality
Customer satisfaction is an indication of the customer’s belief of the
probability of a service leading to a positive feeling (Udo et al., 2010).
According to Kotler and Keller (2006), customer satisfaction is the
consequence of customer experiences during the buying process, and it
plays a crucial role in affecting customers’ future behavior, such as online
repurchase and loyalty (Pereira et al., 2016). Satisfaction is one of the
most important success measures in the business to consumer (B2C)
online environment (Shin et al., 2013). A satisfied online customer would
likely shop again and recommend online retailers to others (e.g., Pereira
et al., 2017), while a dissatisfied customer would leave his/her online
retailer with or without any complaint.
Satisfaction is closely related to customer attitudes and intentions,
which are part of customer behavior (Holloway et al., 2005) and directly
influence customers’ positive behavioral intentions. Prior literature has
confirmed a significant relationship between e-service quality and
customer satisfaction (Blut et al., 2015; Gounaris et al., 2010; Kitapci
et al., 2014; Udo et al., 2010). Gounaris et al. (2010) argue that e-service
quality has a positive effect on satisfaction. E-service quality also has a
positive influence, directly and indirectly, on satisfaction as well as on
three behavior intentions, namely repurchase intention, WOM, and site
revisit. Thus, the following hypothesis is provided to investigate the effect of service quality on customer satisfaction in online shopping.
H5. Overall e-service quality has a positive association with customer
satisfaction
Trust is a major factor for customers to decide whether to buy products from online stores or not (Fortes et al., 2017). According to Wu et al.
(2018), trust can be seen as a belief, confidence, sentiment, or expectation about buyer intention or likely behavior. According to Chang et al.
(2013), lack of trust is a major barrier in the adoption of e-commerce.
Oliveira et al. (2017) measured three dimensions of customer trust
(competence, integrity, and benevolence) and found that customers with
high overall trust demonstrated a higher intention to e-commerce. Previous studies show that e-service quality positively influences trust
(Chiou and Droge, 2006; Cho and Hu, 2009; Rasheed and Abadi, 2014;
Wu et al., 2010, 2018). Alrubaiee & Alkaa’ida (2011) observed that
service quality in the healthcare industry has a direct positive effect on
customer trust and has an indirect positive effect on trust mediated by
customer satisfaction. Shopping through the internet involves trust not
only between internet merchant and customer but also between customer
and the computer system where the transaction is executed (Lee and
Turban, 2001). Trust helps reduce uncertainty when the degree of familiarity between the customer and transaction security mechanism is
insufficient (Wu et al., 2018). Based on these findings, we hypothesize
that in online businesses:
H6. Overall e-service quality has a positive association with customer
trust
Table 4
Heterotrait-monotrait (HTMT) ratio.
IQ WA PP WC PS PO WP SA SL RP SC PR TD OA DC SQ S T RI WOM
IQ
WA 0.703
PP 0.639 0.491
WC 0.693 0.774 0.717
PS 0.688 0.648 0.760 0.830
PO 0.589 0.470 0.342 0.389 0.420
WP 0.545 0.365 0.339 0.336 0.378 0.787
SA 0.568 0.474 0.351 0.381 0.367 0.740 0.679
SL 0.487 0.324 0.170 0.287 0.359 0.504 0.729 0.510
RP 0.478 0.331 0.258 0.273 0.363 0.532 0.596 0.478 0.743
SC 0.539 0.340 0.275 0.318 0.328 0.578 0.634 0.675 0.535 0.727
PR 0.500 0.333 0.244 0.291 0.349 0.489 0.513 0.506 0.503 0.614 0.807
TD 0.550 0.390 0.311 0.323 0.360 0.631 0.677 0.778 0.628 0.493 0.574 0.540
OA 0.541 0.400 0.299 0.359 0.400 0.692 0.696 0.772 0.540 0.588 0.752 0.662 0.875
DC 0.517 0.339 0.148 0.249 0.274 0.504 0.501 0.602 0.428 0.430 0.577 0.399 0.576 0.709
SQ 0.615 0.441 0.348 0.392 0.430 0.658 0.696 0.704 0.457 0.568 0.746 0.569 0.745 0.836 0.632
S 0.599 0.386 0.350 0.383 0.385 0.708 0.715 0.721 0.452 0.633 0.813 0.681 0.683 0.855 0.617 0.883
T 0.539 0.365 0.288 0.320 0.386 0.631 0.781 0.734 0.596 0.601 0.722 0.694 0.752 0.822 0.571 0.787 0.898
RI 0.430 0.318 0.259 0.267 0.297 0.557 0.604 0.637 0.416 0.471 0.521 0.416 0.551 0.653 0.462 0.676 0.812 0.761
WOM 0.573 0.357 0.306 0.351 0.411 0.653 0.670 0.620 0.430 0.529 0.670 0.561 0.560 0.755 0.544 0.771 0.867 0.816 0.871
SR 0.450 0.260 0.204 0.255 0.284 0.569 0.670 0.665 0.522 0.607 0.648 0.494 0.580 0.700 0.618 0.774 0.838 0.802 0.870 0.842
Notes: IQ: Information Quality; WA: Website Aesthetics; PP: Purchase Process; WC: Website Convenience; PS: Product Selection; PO: Price Offerings; WP: Website Personalization; SA: System Availability; SL: Service Level;
RP: Returns Handling/Policies; SC: Security; PR: Privacy; TD: Timeliness of Delivery; OA: Order Accuracy; DC: Delivery Condition; SQ: Overall Service Quality; S: Customer Satisfaction; T: Customer Trust; RI: Repurchase
Intention; WOM: Word of Mouth; SR: Site Revisit.
P. Rita et al. Heliyon 5 (2019) e02690
7
Customer satisfaction is a critical factor to generate customer loyalty
(Pham and Ahammad, 2017). Kotler and Armstrong (2012) stated that
customer satisfaction is the key to the buying behavior of the future.
Repurchase intention indicates an individual’s willingness to make
another purchase from the same company, based on his/her previous
experiences (Filieri & Lin, 2017; Hellier et al., 2003). Customers who are
satisfied with the service provided by a service provider would increase
the usage level and future usage intentions (Henkel et al., 2006).
Customer satisfaction and repurchase intentions can be increased by
offering superior service quality (Cronin et al., 2000). When customers
are satisfied with the product or service they buy, they tend to purchase
again from the same supplier. Several studies have found evidence for a
positive relationship between customer satisfaction and repurchase intentions (Blut et al., 2015; Kitapci et al., 2014; Pham and Ahammad,
2017; Wolfinbarger and Gilly, 2003).
If customers have a high level of trust toward the website, it is more
likely for them to have intention to purchase (Gao, 2011). Moreover, if
customers have already experienced purchases from a website and they
had a good purchase experience from it, then they would likely
repurchase from the same website. Chek and Ho (2016) found evidence
of a positive relationship between customer service, trust and purchase
intention. Based on this evidence, we propose that:
H7. Customer satisfaction has a positive association with repurchase
intention.
H8. Customer trust has a positive association with repurchase intention
Word of mouth (WOM) is product information that individuals
transmit to other individuals (Solomon, 2015). WOM tends to be more
reliable and trustworthy than other messages from formal marketing
channels because customers get the word from people they know (Hwang
& Zhang, 2018; Tuten and Solomon, 2015). WOM communication is an
effective and powerful method to influence purchase decisions, particularly when important information is communicated by reliable and
credible sources (Ennew et al., 2000).
According to Brown et al. (2007), the emergence of the internet has
allowed customers to interact with each other quickly and has easily
established a phenomenon known as interpersonal online influence or
electronic WOM. Customers often use WOM when they are looking for
information about brands, products, services, and organizations. WOM
continues to be recognized as an important source of information
affecting customer product choices (Smith et al., 2005). Unlike offline
customers in physical stores, online customers are more likely to rely on
recommendations from experienced customers before they purchase
because online services are more intangible and harder to evaluate (Wu
et al., 2018).
Companies must be aware of both positive and negative WOM
communication since it is highly related to customer behavioral intentions and affects corporate sales and profits (Jung and Seock, 2017). If
customers trust online retailers, they tend to recommend the online
retailer to friends (Wu et al., 2018), implying that customer trust has
been shifted to the online retailer. According to Wang (2011), not all
satisfied customers result in positive WOM about services, whereas
dissatisfied customers have a strong tendency to share their bad experience with others.
Customers who experience good service quality provided by an ecommerce site tend to engage in positive WOM communication, with
positive WOM being an outcome of customer satisfaction (Kau and Loh,
2006). Kitapci et al. (2014) found that satisfied customers positively
influence their WOM intentions. Kim and Stoel (2004) also showed the
important role of online trust in order for customers to recommend a
brand or website. Customers need to be satisfied with their experience
and trust the information provided by the website before they give a
recommendation to others (Loureiro et al., 2018). Therefore, this
research leads to the following hypotheses:
H9. Customer satisfaction has a positive association with WOM.
H10. Customer trust has a positive association with WOM
Site visitors’ perceived service quality is a significant indicator of
satisfaction as well as post-visit behavioral intentions such as site revisits
(Leung et al., 2011). The more positive the customer feels about a
particular site after an interaction, the more likely the customer is to
return to that site (Gounaris et al., 2010). Another key issue for online
service companies is a customer’s decision to return or not to an internet
site. The decision to revisit a site resembles customer service switching
behavior (Keaveney, 1995), where a customer keeps on using the online
service category but switches from one service provider to another.
Taylor and Strutton (2010) predicted intentions to return to a website. Gounaris et al. (2010) confirmed that the relationship between
customer satisfaction and site revisit was significantly positive. In general, customers tend to use their past retail service experience for decision making in order to formulate strategies for repeat behavior.
Therefore, the following hypothesis is proposed:
H11. Customer satisfaction has a positive association with site revisit
3. Methodology
The research was targeted for specific groups as respondents that
would provide the information necessary for this research and who
matched some set criteria. The respondents were screened to ensure that
they remembered the last experience of using an online retailer website.
The criteria for respondent selection were Indonesian internet users, who
had visited, bought, or used the service offered by online retailers, at least
once during the previous six months. The target population in this study
was comprised of all male and female Indonesian adult individuals over
the age of 17 years old.
In order to test the proposed model, a questionnaire was developed.
Data collection was conducted through an online questionnaire using
Google Docs, and the link shared on social media such as Facebook, LINE,
and WhatsApp. Respondents were directed to a website containing the
questionnaire via the shared link, for its self-administration. Respondents
were instructed to respond based on the last online store that they used
during the last six months.
Overall e-service quality was defined as the overall excellence or
superiority of the service (Zeithaml, 1988). The three items of overall
e-service quality were adapted from Blut (2016). The model constructs
were measured by combining items from WebQual, E-S-Qual, and eTailQ
(Holloway and Beatty, 2008; Parasuraman et al., 2005; Wolfinbarger and
Gilly, 2003). The measurement of e-service quality was assigned to four
dimensions: website design, customer service, security/privacy, and
fulfillment. Based on Blut (2016), e-service quality dimensions were
operationalized as a reflective-formative type (Ringle et al., 2012). The
first-order dimensions of website design consisted of eight attributes:
information quality, website aesthetics, purchase process, website convenience, product selection, price offerings, website personalization, and
system availability. The first-order dimensions of customer service consisted of two attributes: service level and return handling/policies. The
first-order dimension of security/privacy consisted of two attributes:
security and privacy. Lastly, the first-order dimension of fulfillment
consisted of three attributes: timeliness of delivery, order accuracy, and
delivery condition.
The customer satisfaction scale was adapted from Fornell (1992) and
customer trust was measured by six items adopted from Gefen (2002),
Lee and Turban (2001) and Urban et al. (2009). Repurchase intention
and WOM was measured with items adopted from Zeithaml et al. (1996).
Site revisit was developed from Gounaris et al. (2010). All of the constructs and reflective items were measured using a seven-point scale
ranging from 1 strongly disagree to 7 strongly agree (Table 1).
This research used partial least square (PLS) path modeling as
implemented in Smart PLS software to assess the validity and reliability
of the measurement. Composite reliability (CR), factor loading, and
average variance extracted (AVE) were used to test the convergent
P. Rita et al. Heliyon 5 (2019) e02690
8
validity. It is acceptable if an individual item factor loading is greater
than 0.70, composite reliability exceeds 0.70, and AVE exceed 0.50
(Gefen et al., 2000). Factor loading exceeding 0.50 is acceptable, while a
value exceeding 0.70 shows strong evidence of convergent validity
(Bagozzi and Yi, 1988). All the factor loading estimates exceeded 0.70,
except T1 and SR1 (therefore these were eliminated), and Bootstrap
t-statistics showed strong evidence of convergent validity. AVE of each
reflective construct in this research also exceeded 0.50 (ranging from
0.641 to 0.880) as shown in Table 2. The AVE indicated that most of the
variance of each indicator was explained by its own construct. Thus,
convergent validity was confirmed.
This research used three measures to assess the discriminant validity:
Fornell-Lacker criterion, cross-loadings, and heterotrait-monotrait
(HTMT) ratio of correlations criterion. According to Hair et al. (2010),
discriminant validity ensures that a construct measure is empirically
unique and represents phenomena of interest that other measures in a
structural equation model do not capture. Discriminant validity is
established if a latent variable accounts for more variance in its associated indicator variables than it shares with other constructs in the same
model (Fornell and Larcker, 1981). Table 2 shows the square root of AVEs
(in bold) compared with the correlation of other constructs. Since the
square roots of AVEs were higher than the correlation between other
constructs, it met the acceptable discrimination. A second approach for
establishing discriminant validity is cross-loadings. According to Chin
(1998), each indicator loading should be greater than all cross-loadings.
Table 3 shows that each indicator loading (in bold) is greater than all of
its cross-loadings. The third approach is the heterotrait-monotrait
(HTMT) ratio of correlations. If the HTMT value is below 0.90,
discriminant validity has been established between two reflective constructs (Henseler et al., 2014). all construct had HTMT value below 0.90
as shown in Table 4. Thus, the discriminant validity of the measurement
model was also established.
Cronbach’s alpha can assess the internal consistency reliability of the
instruments. Cronbach’s alpha should be 0.7 or higher, for exploratory
purposes, but 0.6 or higher is also acceptable (Hair et al., 2011). All
reflective constructs proved to be reliable since all Cronbach’s alpha were
greater than 0.7 (ranging from 0.770 to 0.931) as illustrated in Table 2.
In this study, e-service quality dimensions: website design, customer
service, security/privacy, and fulfillment were second-order constructs
with a reflective-formative type (Ringle et al., 2012). Each of their
first-order constructs was reflective, and the relationships between
e-service quality attributes (first-order constructs) and the e-service
quality dimensions (second-order constructs) were formative. Hence, the
multi-collinearity test, as well as the significance and the sign of weights
test, were computed. Based on the test of significance and the sign of
weights, all four e-service quality dimensions were statistically significant (p < 0.01), and all of them had positive signs. Table 5 shows that all
VIF values of first-order constructs (ranging from 1.607 to 3.065) were
below the threshold of 3.3 (Lee and Xia, 2010), the extent of
multi-collinearity was concluded to be non-problematic. Thus, the
formative constructs could be used to test the structural model.
4. Results
In the hypotheses testing, eleven paths were examined in the structural model. Here are the paths that were examined in this study:
SQ ¼ β0 þ β1WD þ β2CS þ β3SP þ β4FF þ u
where SQ (overall e-service quality) is the dependent variable; WD
(website design), CS (customer service), SP (security/privacy), and FF
(fulfillment) are independent variables; β0 is the intercept parameter; β1;
β2; β3; and β4 are slope parameters in the relationship between the
dependent variable and the independent variables, and u is the error term
for observation.
S ¼ β0 þ β1SQ þ u
where S (customer satisfaction) is the dependent variable; SQ (overall
e-service quality) is the independent variable; β0 is the intercept
parameter; β1is the slope parameter in the relationship between the
dependent variable and the independent variable, and u is the error term
for observation.
T ¼ β0 þ β1SQ þ u
where T (customer trust) is the dependent variable; SQ (overall eservice quality) is the independent variable; β0 is the intercept parameter;
β1is the slope parameter in the relationship between the dependent
variable and the independent variable, and u is the error term for
observation.
RI ¼ β0 þ β1S þ β2T þ u
where RI (repurchase intention) is the dependent variable; S
(customer satisfaction) and T (customer trust) are the independent variables; β0 is the intercept parameter; β1 and β2 are the slope parameters in
the relationship between the dependent variable and the independent
variables, and u is the error term for observation.
Table 6
Construct collinearity assessment (VIF).
Construct e-Service Quality Customer Satisfaction Customer Trust Repurchase Intention Word of Mouth Site Revisit
Website Design 1.862
Customer Service 1.827
Security/Privacy 2.099
Fulfilment 2.226
e-Service Quality 1.000 1.000
Customer Satisfaction 2.717 2.717 1.000
Customer Trust 2.717 2.717
Table 5
Formative measurement model evaluation.
Formative construct (secondorder construct)
Reflective constructs (first
order construct)
Weights VIF
Website Design Information Quality
Website Aesthetics
Purchase Process
Website Convenience
Product Selection
Price Offerings
Website Personalization
System Availability
0.208***
0.184***
0.131***
0.184***
0.162***
0.162***
0.177***
0.158***
2.328
2.265
1.833
2.999
2.375
2.025
1.904
1.730
Customer Service Service Level
Return Handing/Policies
0.486***
0.626***
1.607
1.607
Security/Privacy Security
Privacy
0.543***
0.541***
1.968
1.968
Fulfillment Timeliness of Delivery
Order Accuracy
Delivery Condition
0.429***
0.442***
0.261***
2.548
3.065
1.631
Notes: *p < 0.10; **p < 0.05; ***p > 0.01.
P. Rita et al. Heliyon 5 (2019) e02690
9
WOM ¼ β0 þ β1S þ β2T þ u
where WOM (word-of-mouth) is the dependent variable; S (customer
satisfaction) and T (customer trust) are the independent variables; β0 is
the intercept parameter; β1 and β2 are the slope parameters in the relationship between the dependent variable and the independent variables,
and u is the error term for observation.
SR ¼ β0 þ β1S þ u
where SR (site revisit) is the dependent variable; S (customer satisfaction) is the independent variable; β0 is the intercept parameter; β1is
the slope parameter in the relationship between the dependent variable
and the independent variable, and u the is error term for observation.
To test all the paths above, first, we determined the presence of
construct multi-collinearity using the variance inflation factor (VIF)
assessment. Small VIF values indicate low correlation among constructs.
According to Lee and Xia (2010), if the VIF values are below the
threshold of 3.3, then there is no problem with multi-collinearity. Table 6
shows that all VIF values (ranging from 1.000 to 2.717) were below the
threshold of 3.3, so the extent of multi-collinearity was concluded to be
non-problematic.
Hypotheses were tested based on the level of significance in the path
coefficient using the bootstrapping technique (Hair et al., 2011) with
5000 iterations of re-sampling, and each bootstrap sample constituted by
the number of observations (in this instance 355 cases). The test showed
that of the eleven path coefficients, ten hypotheses were supported, while
one hypothesis failed to be confirmed. The result of hypotheses testing is
shown in Fig. 2.
The conceptual model explained 64.6% of the variation in overall
service quality with predictive relevance Q2 of 0.522, which suggest that
the model has predictive relevance. The hypothesis of web design (bβ ¼
0.225; p < 0.01), security/privacy (bβ ¼ 0.205; p < 0.01), and fulfillment
(bβ ¼ 0.507; p < 0.01) are statistically significant. Nevertheless, customer
service (bβ ¼ -0.001; p > 0.10) is not statistically significant. Therefore,
hypotheses H1, H3, and H4 are supported, however H2 is not supported
to explain overall e-service quality.
The conceptual model explained 62.4% of the variation in customer
satisfaction and also explained 51.6% of the variation in customer trust
with predictive relevance Q2 of 0.453 and 0.354, respectively. The hypothesis of overall service quality influence on customer satisfaction (bβ ¼
0.791; p < 0.01) and the hypothesis of overall service quality influence
on customer trust (bβ ¼ 0.719; p < 0.01) are statistically significant.
Therefore, hypotheses H5 and H6 are supported.
The conceptual model explained 55.9% of the variation in repurchase
intention with predictive relevance Q2 of 0.451. The hypothesis of
customer satisfaction impact on repurchase intention (bβ ¼ 0.459; p <
0.01) and the hypothesis of customer trust impact on repurchase intention (bβ ¼ 0.331; p < 0.01) are statistically significant. Therefore, hypotheses H7 and H8 are supported to explain repurchase intention.
The conceptual model explained 65.6% of the variation in WOM with
predictive relevance Q2 of 0.545. The hypothesis of customer satisfaction
influence on WOM (bβ ¼ 0.488; p < 0.01), and customer trust influence on
WOM (bβ ¼ 0.367; p < 0.01) are statistically significant. Therefore, hypotheses H9 and H10 are supported to explain WOM.
The conceptual model explained 52.2% of the variation in site revisit
with predictive relevance Q2 of 0.434. The hypothesis of customer
satisfaction impact on site revisit (bβ ¼ 0.723; p < 0.01) is statistically
significant. Therefore, hypotheses H11 is supported to explain site revisit.
The strength of the relationship between constructs on each hypothesis is shown by Cohen’s f2 value. Cohen (1988) defined values near
0.02 as small, near 0.15 as medium, and above 0.35 as large. Thus,
overall e-service quality had a large impact on both customer satisfaction
and customer trust. Customer satisfaction had a large impact on site
revisit, and a medium impact on repurchase intention and WOM.
Customer trust had a medium impact on repurchase intention and site
revisit. Fulfillment had a medium impact on e-service quality, while
security/privacy and website design had a small impact on overall
e-service quality.
Fig. 2. Estimated model. Notes: (n.s.) ¼ not significant; * p <0.10; ** p<0.05; *** p>0.01.
P. Rita et al. Heliyon 5 (2019) e02690
10
5. Discussion
This study was designed to investigate e-service quality in online
businesses and develop new knowledge to understand the most important dimensions of e-service quality. The study also aimed to enhance
prior understanding of how e-service quality affected customer satisfaction, customer trust, and customer behavior, i.e., repurchase intention,
WOM, and site revisit. Table 7 summarizes the results of hypotheses test
of this study.
Previous studies suggested applying the e-service quality measurement to other countries to test whether the measurement worked equally
well in a different country and cultural setting (Blut, 2016; Gounaris
et al., 2010). Through the conducted study, it was found that three out of
four dimensions of e-service quality (website design, security/privacy,
and fulfillment) had a positive impact on e-service quality, whereas the
customer service dimension did not have impact on e-service quality.
Thus, a company needs to pay attention to these dimensions more specifically and seek breakthroughs that can improve its performance and
e-service quality. The literature emphasizes the strong relation of e-service quality dimensions to build the perception of overall e-service
quality. Website design has the highest impact on e-service quality, while
customer service has the lowest impact (Blut, 2016). In this study,
fulfillment had the highest impact on e-service quality. Website design
and security/privacy had almost the same impact on e-service quality.
Surprisingly, in the Indonesian context, customer service was not relevant to build the perception of overall e-service quality of an online store.
According to Wolfinbarger and Gilly (2003), not all customers need
customer service in each transaction, so customer service is only scantily
related to quality. Contrarily, in the Blut et al. (2015) study, security was
not relevant to overall e-service quality in the four-dimension e-service
quality model. Meanwhile, Wolfinbarger & Gilly (2003) found that
customer service and security were not significant to e-service quality.
Different countries culture may give varied outcomes on which attributes and dimension of e-service quality matters to create the
perception of overall e-service quality. Thus, the result of this study
compared with a previous study that used same e-service quality measurements. The previous study done by Blut (2016) examined online
shoppers is the U.S. Fig. 3 shows that Indonesia and the U.S. have
different country cultures in terms of power distance, individualism, and
long term orientation. Blut et al. (2015) found that collectivism
strengthens the association between fulfillment and overall e-service
quality. In line with this study, fulfillment proved to have the highest
impact on overall e-service quality rather than three other service quality
dimensions.
From the power distance side, customers in a high power distance
culture expect companies providing e-service quality to provide more
security (Hofstede, 1984). High power distance will strengthen the effect
of security on overall e-service quality (Blut et al., 2015). In this study,
although security had a low impact on overall service quality, it was
significant. Although security/privacy had low impact in this study, it
should not be underestimated. Online stores, particularly, must keep
customers’ private information to make customers convinced to purchase
goods in the online store.
From the standpoint of long-term orientation (LTO), Indonesia’s high
score indicates that it has a pragmatic culture while the US has a
normative culture. According to Hofstede (1984), normative cultures
tend to analyze new information to check whether it is true. For a country
with low LTO, information is important, so, low LTO strengthens the
association between website design and overall service quality. Thus in
the Blut (2016) study, website design had the highest impact on overall
service quality than three other service quality dimensions. As a country
with a pragmatic culture, website design only had a low impact on
overall e-service quality, but the importance should not be underestimated. An online store’s website design should at least be visually
appealing, easy to read, and provide enough information regarding the
product they sell.
Customer satisfaction and customer trust appeared as the outcomes of
overall e-service quality in the model. The results of this study showed
that e-service quality had a positive impact on customer satisfaction. The
majority of research done about e-service quality states that customer
satisfaction is the main determinant impacting on e-service quality. It
supports the idea that there is a significant relationship between e-service
quality and customer satisfaction (Kitapci et al., 2014). E-service quality
also had a positive impact on customer trust. The better the e-service
quality of a company, the higher the customer trust. Providing good
service quality enhances customer satisfaction and customer trust. This
result is aligned with previous studies conducted by Wu et al. (2010) and
Wu et al. (2018).
The investigation found that customer satisfaction had a positive
impact on repurchase intention, word-of-mouth, and site revisit. According to Wolfinbarger and Gilly (2003), when customers are satisfied
with a product or service they buy, they will purchase it again from the
same provider in the future. Gounaris et al. (2010) examined the relationship of satisfaction to customer behavioral intention: purchase
intention, site-revisit, and WOM in the context of internet shopping. In
line with the Gounaris et al. (2010) study, the findings of this study
showed that customer satisfaction had the highest impact on site revisit
rather than repurchase intention and WOM.
Customer trust had a positive impact on repurchase intention and
Table 7
Structural relationship test results.
Hypothesis Hypothesis Statement Path
coefficient
(Sig. value)
Effect
size (f2)
Conclusion
H1 Website design has a
positive association
with overall e-service
quality
0.225***
(0.000)
0.077 H1
supported
H2 Customer service has a
positive association
with overall e-service
quality
-0.030
(0.494)
0.001 H2 not
supported
H3 Security/privacy has a
positive association
with overall e-service
quality
0.205***
(0.001)
0.057 H3
supported
H4 Fulfilment has a
positive association
with overall e-service
quality
0.507***
(0.000)
0.329 H4
supported
H5 Overall e-service quality
has a positive
association with
customer satisfaction
0.791***
(0.000)
1.668 H5
supported
H6 Overall e-service quality
has a positive
association with
customer trust
0.516***
(0.000)
1.071 H6
supported
H7 Customer satisfaction
has a positive
association with
repurchase intention
0.459***
(0.000)
0.177 H7
supported
H8 Customer trust has a
positive association
with repurchase
intention
0.331***
(0.000)
0.092 H8
supported
H9 Customer satisfaction
has a positive
association with WOM
0.488***
(0.000)
0.256 H9
supported
H10 Customer trust has a
positive association
with WOM
0.367***
(0.000)
0.145 H10
supported
H11 Customer satisfaction
has a positive
association with site
revisit
0.723***
(0.000)
1.097 H11
supported
Statistical significance p < 0.001.
P. Rita et al. Heliyon 5 (2019) e02690
11
word-of-mouth. The more a customer trusts a company, the more likely
(s)he is to recommend the company to others. Gremler et al. (2001)
proved that trust exhibits a positive effect on making a recommendation.
Because of the difficulty to evaluate online services, customers are likely
to rely on recommendations from experienced customers. In line with the
results of this study, customer trust had a higher impact on WOM than on
repurchase intention.
6. Conclusion
This study is an extensive inquiry related to e-service quality in online
business. This analysis is exploratory research to identify which e-service
quality attributes were available in Indonesian based online stores using
the four dimension of e-service quality model suggested by Blut et al.
(2015) and measures the impact of e-service quality on customer satisfaction and customer trust which later have impact on repurchase
intention, word of mouth and site revisit using the model developed by
Blut (2016), Gounaris et al. (2010), Kitapci et al. (2014), and Rasheed
and Abadi (2014). This research adopted one of the most comprehensive
models of e-service quality that is able to predict customer behavior
better than other widely used scales and not overestimate the importance
of e-service quality attributes. The results are expected to extend the
knowledge about different country cultures vis- a-vis the diverse relevance of e-service quality attributes. The findings show that website
design, security/privacy, and fulfillment are essential to building superior service quality of an online store, while customer service is not an
important dimension of e-service quality in the Indonesian context.
The conceptualization of e-service quality used in this study proved to
have a better ability to predict customer behavior than other commonly
used measurements such as WebQual and E-S-Qual (Blut et al., 2015).
Based on the literature review, the hierarchical model of e-service quality
is the best model available to determine e-service quality in terms of
predictive consumer behavior ability, and it is more comprehensive to
capture online store attributes. However, only Blut’s (2016) study found
using the measurement developed by Blut et al. (2015). Many studies still
adopt WebQual, SERVQUAL and E-S-QUAL measurement to measure
e-service quality. Thus, this research combined the hierarchical model of
e-service quality with trust, which is important as it reinforces the
adoption of e-commerce. Previous studies only examined the hierarchical
model with satisfaction, repurchase intention, and WOM in a single
country. To the best of our knowledge, this is the first time that the
hierarchical model is combined with trust.
By adopting a model which is not widely used yet, this study presents
a new understanding of e-service quality of online business, especially
how country culture matters, and which dimensions of service quality
had the most impact to build the perception of overall service quality.
This research contributed to wider scientific knowledge by comparing
the implementation of two hierarchical models of e-service quality in two
different country cultural settings, using the outcomes of this study and
the results of a previous study by Blut (2016), that had not been investigated before.
The findings give insight for managers to better understand how eservice quality is formed and how important each attribute and dimension of e-service quality is to ensure customer satisfaction and trust,
which in the end can help to retain online customers. Managers can
improve the service quality of online stores based on the results of this
research and combine it with the recent market trends. For example, from
the aspect of security/privacy that mostly related to credit card information safety. In Indonesia, 52 percent of payment methods are cash on
delivery, followed by ATM/bank transfer (45 percent) and credit card (2
percent) (ecommerceIQ, 2018). By using cash on delivery and bank
transfer payment methods, customers do not need to worry about their
payment card data security.
Managers should carefully consider the attributes of e-service quality
to develop their online stores. To provide superior service quality,
companies should provide an excellent website design that consists of
sufficient information, visually appealing content, easy to make payments, easy to read text, offer some discounts and/or promotions, and
quick loading capacity. Beyond that, companies must ensure the timeliness of delivery and ensure the customers’ data security and privacy. In
the Indonesian context, customer service was not found as significant to
overall service quality. Managers should focus on website design, security/privacy, and fulfillment. Managers can hire a website designer to
create attractive websites. Since fulfillment had the highest impact on
overall service quality, managers must make sure that the product is
delivered in good condition and within the promised time. Having
partnerships with several delivery courier services and letting customers
choose which one they want might be a good idea. Managers should
enter into agreements with delivery services if products are broken
during the delivery, decide which party should be responsible for damage, so it does not harm customer satisfaction and trust.
Since customer satisfaction and customer trust significantly affect
78
14
46 48
62
40 38
91
62
46
26
68
Power Distance Individualism Masculinity Uncertainty
Avoidance
Long Term
OrientaƟon
Indulgence
Indonesia United States
Fig. 3. Hofstede country comparison: Indonesia and the United States. Source: Hofstede Insight Website (n.d.)
P. Rita et al. Heliyon 5 (2019) e02690
12
customer behavior, managers should incorporate it into their marketing
strategy. Online stores usually have feedback features on their websites.
A company can reinforce WOM action by providing “share feedback to
friends” features. After customers receive the good they ordered, they can
write feedback on the online store website. Customers have the option to
share their experience with their friends as WOM action. Small rewards
like special discounts in the next purchase will encourage customers to
spread their buying experience to others, which can bring more potential
customers to visit a company’s online store.
The huge number of smartphone users in Indonesia is a major opportunity to develop mobile online store applications. Investing more in
the development of mobile access and giving priority to the development
of features in mobile applications might help to increase the e-service
quality of online stores. Managers could also make mobile-friendly
websites.
This study has several limitations that could be addressed in future
research. First, this study used a non-probability sampling method. The
sample of this study was also limited to customers who had experience
using online retailer websites in Indonesia. The research outcomes may
lack generalizability.
Second, this study analyzed the e-service quality of online stores in
general, not based on the product segments sold in the online store. The
measurement used in this study may not be applicable to assess some
product segments. Future research should consider a variety of product
segments and/or other industries to make sure that the measurement
works equally well for specific product categories. In other industry
settings, the measurement may need to be adjusted.
Finally, this research only tests the direct effect of each variable
without considering the potential moderating effect among variables.
Future research should probe more on the moderating effect side of each
variable. Future research could also replicate this study in other cultural
contexts and other industries in order to be able to generalize the results.
Declarations
Author contribution statement
Paulo Rita, Tiago Oliveira, Almira Farisa: Conceived and designed the
experiments; Performed the experiments; Analyzed and interpreted the
data; Contributed reagents, materials, analysis tools or data; Wrote the
paper.
Funding statement
This research did not receive any specific grant from funding agencies
in the public, commercial, or not-for-profit sectors.
Competing interest statement
The authors declare no conflict of interest.
Additional information
No additional information is available for this paper.
References
Alrubaiee, L., Alkaa’ida, F., 2011. “The mediating effect of patient satisfaction in the
patients’ perceptions of health quality-patient trust relationship. Int. J. Mark. Stud. 3
(1), 103–127.
Bagozzi, R.P., Yi, Y., 1988. On the evaluation of structural equation models. J. Acad.
Mark. Sci. 16 (1), 74–94.
Barnes, S.J., Vidgen, R.T., 2002. An integrative approach to the assessment of e-commerce
quality. J. Electron. Commer. Res. 3 (2), 114–127.
Blut, M., 2016. E-service quality: development of a hierarchical model. J. Retail. 92 (4),
500–517.
Blut, M., Chowdhry, N., Mittal, V., Brock, C., 2015. E-service quality: a meta-analytic
review. J. Retail. 91 (4), 679–700.
Brady, M.K., Robertson, C.J., 2001. Searching for a consensus on the antecedent role of
service quality and satisfaction: an exploratory cross-national study. J. Bus. Res. 51
(1), 53–60.
Brown, J., Broderick, A.J., Lee, N., 2007. Word of mouth communication within online
communities: conceptualizing the online social network. J. Interact. Mark. 21 (3),
2–20.
Brusch, I., Schwarz, B., Schmitt, R., 2019. David versus goliath – service quality factors for
niche providers in online retailing. J. Retail. Consum. Serv. 50, 266–276.
Business.com., 2017. Why Some Customers Prefer Online Business to Traditional Retail
Stores.
Chang, H. Hsin, Wang, H., 2011. The moderating effect of customer perceived value on
online shopping behaviour. Online Inf. Rev. 35 (3), 333–359.
Chang, M.K., Cheung, W., Tang, M., 2013. Building trust online: interactions among trust
building mechanisms. Inf. Manag. 50 (7), 439–445.
Chek, Y.L., Ho, J.S.Y., 2016. “Consumer electronics e-retailing: why the alliance of
vendors’ e-service quality, trust and trustworthiness matters. Procedia – Soc. Behav.
Sci. 219, 804–811.
Chin, W., 1998. The partial least squares approach to structural equation modeling. Mod.
Methods Bus. Res. 295 (2), 295–336.
Chiou, J.-S., Droge, C., 2006. Service quality, trust, specific asset investment, and
expertise: direct and indirect effects in satisfaction-loyalty framework. J. Acad. Mark.
Sci. 34 (4), 613–627.
Cho, J.E., Hu, H., 2009. The effect of service quality on trust and commitment varying
across generations. Int. J. Consum. Stud. 33 (4), 468–476.
Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences, second ed.
Lawrence Earlbaum Associates, Hillsdale, NJ.
Cox, B., Koelzer, W., 2004. Internet Marketing in Hospitality. Pearson Prentice Hall, New
Jersey.
Cronin, J.J., Brady, M.K., Hult, G.T.M., 2000. Assessing the effects of quality, value, and
customer satisfaction on consumer behavioral intentions in service environments.
J. Retail. 76 (2), 193–218.
Dabholkar, P.A., 1996. Consumer evaluations of new technology-based self-service
options: an investigation of alternative models of service quality. Int. J. Res. Mark. 13
(1), 29–51.
Díaz, E., Koutra, C., 2013. Evaluation of the persuasive features of hotel chains websites: a
latent class segmentation analysis. Int. J. Hosp. Manag. 34 (1), 338–347.
ecommerceIQ, 2018. Indonesia – Order Share by Payment Method.
Ennew, C.T., Banerjee, A.K., Li, D., 2000. Managing word of mouth communication:
empirical evidence from India. Int. J. Bank Mark. 18 (2), 75–83.
Fornell, C., 1992. A national customer satisfaction barome- ter: the Swedish experience.
J. Mark. 56 (1), 6–21.
Fornell, C., Larcker, D., 1981. Evaluating structural equation models with unobservable
variables and measurement error. J. Mark. Res. 18 (3), 39–50.
Fortes, N., Rita, P., 2016. Privacy concerns and online purchasing behaviour: towards an
integrated model. Eur. Res. Manag. Bus. Econ. 22 (3), 167–176.
Fortes, N., Rita, P., Pagani, M., 2017. The effects of privacy concerns, perceived risk
and trust on online purchasing behaviour. Int. J. Internet Mark. Advert. 11 (4).
Filieri, R., Lin, Z., 2017. The role of aesthetic, cultural, utilitarian and branding factors in
young Chinese consumers’ repurchase intention of smartphone brands. Comput.
Hum. Behav. 67, 139–150.
Gao, F., 2011. A study of online purchase intention: based on the perspective of customer
trust. In: In International Conference on Management and Service Science, MASS
2011.
Gefen, D., 2002. Customer loyalty in e-commerce. J. Assoc. Inf. Syst. 3 (1), 27–53.
Gefen, D., Straub, D., Boudreau, M.-C., 2000. Structural equation modeling and
regression: guidelines for research practice. Commun. Assoc. Inf. Syst. 4 (1), 7.
Gounaris, S., Dimitriadis, S., Stathakopoulos, V., 2010. “An examination of the effects of
service quality and satisfaction on customers’ behavioral intentions in e-shopping.
J. Serv. Mark. 24 (2–3), 142–156.
Gremler, D.D., Gwinner, K.P., Brown, S.W., 2001. Generating positive word-of-mouth
communication through customer-employee relationships. Int. J. Serv. Ind. Manag.
12 (1), 44–59.
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E., 2010. Multivariate Data Analysis.
Vectors.
Hair, J.F., Ringle, C.M., Sarstedt, M., 2011. Pls-sem: indeed a silver bullet. J. Mark.
Theory Pract. 19 (2), 139–152.
Hellier, P.K., Geursen, G.M., Carr, R.A., Rickard, J.A., 2003. Customer repurchase
intention. Eur. J. Market. 37 (11/12), 1762–1800.
Henkel, D., Houchaime, N., Locatelli, N., Singh, S., Zeithaml, V.A., Bitterner, 2006. The
Impact of Emerging WLANs on Incumbent Cellular Service Providers in the U.S.
McGraw-Hill, Sinagpore.
Henseler, J., Ringle, C.M., Sarstedt, M., 2014. A new criterion for assessing discriminant
validity in variance-based structural equation modeling. J. Acad. Mark. Sci. 43 (1),
115–135.
Hofstede, G.H., 1984. Culture’s Consequences: International Differences in Work-Related
Values. SAGE Publications, Newbury Park.
Holloway, B.B., Beatty, S.E., 2008. Satisfiers and dissatisfiers in the online environment: a
critical incident assessment. J. Serv. Res. 10 (4), 347–364.
Holloway, B.B., Wang, S., Parish, J.T., 2005. The role of cumulative online purchasing
experience in service recovery management. J. Interact. Mark. 19 (3), 54–66.
Jung, N.Y., Seock, Y.K., 2017. “Effect of service recovery on customers’ perceived justice,
satisfaction, and word-of-mouth intentions on online shopping websites. J. Retail.
Consum. Serv. 37, 23–30.
Kansra, P., Jha, A.K., 2016. Measuring service quality in indian hospitals: an analysis of
servqual model. Int. J. Serv. Oper. Manag. 24 (1), 1–17.
P. Rita et al. Heliyon 5 (2019) e02690
13
Kau, A., Loh, E. Wan-Yiun, 2006. The effects of service recovery on consumer satisfaction:
a comparison between complainants and non-complainants. J. Serv. Mark. 20 (2),
101–111.
Keaveney, S.M., 1995. Customer switching behavior in service industries: an exploratory
study. J. Mark. 59 (2), 71–82.
Kim, S., Stoel, L., 2004. Apparel retailers: website quality dimensions and satisfaction.
J. Retail. Consum. Serv. 11 (2), 109–117.
Kitapci, O., Akdogan, C., Dortyol, I.T., 2014. The impact of service quality dimensions on _
patient satisfaction, repurchase intentions and word-of-mouth communication in the
public healthcare industry. Procedia – Soc. Behav. Sci. 148, 161–169.
Kotler, P.T., Armstrong, G., 2012. Principles of Marketing, fourteenth ed. Pearson
Prentice Hall, Upper Saddle River.
Kotler, P.T., Keller, K.L., 2006. Marketing Management. Pearson Prentice Hall, New
Jersey.
Lee, G., Lin, H., 2005. “Customer perceptions of e-service quality in online shopping. Int.
J. Retail Distrib. Manag. 33 (2), 161–176.
Lee, M.K.O., Turban, E., 2001. A trust model for consumer internet shopping. Int. J.
Electron. Commer. 6 (1), 75–91.
Lee, G., Xia, W., 2010. Toward agile: an integrated analysis of quantitative and qualitative
field data on software development agility. MIS Q. 34 (1), 87–114.
Leung, D., Law, R., Lee, H.A., 2011. The perceived destination image of Hong Kong on
ctrip.com. Int. J. Tour. Res. 13 (2), 124–140.
Liao, T.H., Keng, C.J., 2013. Online shopping delivery delay: finding a psychological
recovery strategy by online consumer experiences. Comput. Hum. Behav. 29 (4),
1849–1861.
Loiacono, E., Watson, R.T., Goodhue, D., 2002. WebqualTM: a web site quality instrument.
In: In American Marketing Association: Winter Marketing Educators’ Conference (pp.
1–12).
Loureiro, S.M.C., Cavallero, L., Miranda, F.J., 2018. Fashion brands on retail websites:
customer performance expectancy and e-word-of-mouth. J. Retail. Consum. Serv. 41,
131–141.
McLean, G., Wilson, A., 2016. Evolving the online customer experience … is there a role
for online customer support? Comput. Hum. Behav. 60, 602–610.
Mutum, D., Mohd Ghazali, E., Nguyen, B., Arnott, D., 2014. Online loyalty and its
interaction with switching barriers. J. Retail. Consum. Serv. 21 (6), 942–949.
Oliveira, T., Alhinho, M., Rita, P., Dhillon, G., 2017. Modelling and testing consumer trust
dimensions in e-commerce. Comput. Hum. Behav. 71, 153–164.
Pan, X., Ratchford, B.T., Shankar, V., 2002. Can price dispersion in online markets be
explained by differences in e-tailer service quality? J. Acad. Mark. Sci. 30 (4),
433–445.
Parasuraman, A., Zeithaml, V.A., Berry, L.L., 1985. A conceptual model of service quality
and its implications for future research. J. Mark. 49 (4), 41.
Parasuraman, A., Zeithaml, V.A., Malhotra, A., 2005. E-s-qual a multiple-item scale for
assessing electronic service quality. J. Serv. Res. 7 (3), 213–233.
Pereira, H.G., Salgueiro, M. de F., Rita, P., 2016. Online purchase determinants of loyalty:
the mediating effect of satisfaction in tourism. J. Retail. Consum. Serv. 30, 279–291.
Pereira, H.G., de F atima Salgueiro, M., Rita, P., 2017. Online determinants of e-customer
satisfaction: application to website purchases in tourism. Serv. Bus. 11 (2), 375–403.
Pham, T.S.H., Ahammad, M.F., 2017. Antecedents and consequences of online customer
satisfaction: a holistic process perspective. Technol. Forecast. Soc. Chang. 124,
332–342.
Pires, G.D., Stanton, J., Rita, P., 2006. The internet, consumer empowerment and
marketing strategies. Eur. J. Market. 40 (9/10), 936–949.
Quora, 2017. E-commerce is Affecting Brick and Mortar Retail, But But in the Way Yiu
Think.
Rasheed, F.A., Abadi, M.F., 2014. Impact of service quality, trust and perceived value on
customer loyalty in Malaysia services industries. Procedia – Soc. Behav. Sci. 164,
298–304.
Ringle, C.M., Sarstedt, M., Straub, D., 2012. A critical look at the use of pls-sem in mis
quarterly. MIS Q. 36 (1), iii–xiv.
Saleem, M.A., Zahra, S., Yaseen, A., 2017. “Impact of service quality and trust on
repurchase intentions – the case of Pakistan airline industry. Asia Pac. J. Mark. Logist.
29 (5), 1136–1159.
Schmidt, S., Cantallops, A.S., dos Santos, C.P., 2008. The characteristics of hotel websites
and their implications for website effectiveness. Int. J. Hosp. Manag. 27 (4), 504–516.
Sharma, G., Lijuan, W., 2015. The effects of online service quality of e-commerce websites
on user satisfaction. Electron. Libr. 33 (3), 468–485.
Shin, J.I., Chung, K.H., Oh, J.S., Lee, C.W., 2013. The effect of site quality on repurchase
intention in internet shopping through mediating variables: the case of university
students in South Korea. Int. J. Inf. Manag. 33 (3), 453–463.
Smith, D., Menon, S., Sivakumar, K., 2005. Online peer and editorial recommendations,
trust, and choice in virtual markets. J. Interact. Mark. 19 (3), 15–37.
Solomon, M.R., 2015. Customer Behavior: Buying, Having, and Being. Pearson Education
Limited, Harlow.
Statista, 2018a. Number of Digital Buyers in indonesia from 2016-2022 (In Millions).
Statista, 2018b. Number of Internet Users in indonesia from 2015 to 2022.
Taylor, D.G., Strutton, D., 2010. Has e-marketing come of age? modeling historical
influences on post-adoption era internet consumer behaviors. J. Bus. Res. 63 (9–10),
950–956.
Tsao, W.-C., Hsieh, M.-T., Lin, T.M.Y., 2016. Intensifying online loyalty! the power of
website quality and the perceived value of consumer/seller relationship. Ind. Manag.
Data Syst. 116 (9), 1987–2010.
Turel, O., Connelly, C.E., 2013. Too busy to help: antecedents and outcomes of
interactional justice in web-based service encounters. Int. J. Inf. Manag. 33 (4),
674–683.
Tuten, T.L., Solomon, M.R., 2015. Social Media Marketing, second ed. SAGE Publication
Ltd, London.
Udo, G.J., Bagchi, K.K., Kirs, P.J., 2010. “An assessment of customers’ e-service quality
perception, satisfaction and intention. Int. J. Inf. Manag. 30 (6), 481–492.
Urban, G.L., Amyx, C., Lorenzon, A., 2009. Online trust: state of the art, new frontiers, and
research potential. J. Interact. Mark. 23 (2), 179–190.
Wang, X., 2011. “The effect of inconsistent word-of-mouth during the service encounter.
J. Serv. Mark. 25 (4), 252–259.
Wang, L., Law, R., Guillet, B.D., Hung, K., Fong, D.K.C., 2015. Impact of hotel website
quality on online booking intentions: etrust as a mediator. Int. J. Hosp. Manag. 47,
108–115.
Wang, S., Cavusoglu, H., Deng, Z., 2016. Early mover advantage in e-commerce platforms
with low entry barriers: the role of customer relationship management capabilities.
Inf. Manag. 53 (2), 197–206.
Wolfinbarger, M., Gilly, M.C., 2003. Etailq: dimensionalizing, measuring and predicting
etail quality. J. Retail. 79 (3), 183–198.
Wu, J.J., Chen, Y.H., Chung, Y.S., 2010. Trust factors influencing virtual community
members: a study of transaction communities. J. Bus. Res. 63 (9–10), 1025–1032.
Wu, J.J., Hwang, J.N., Sharkhuu, O., Tsogt-Ochir, B., 2018. Shopping online and off-line?
complementary service quality and image congruence. Asia Pac. Manag. Rev. 23 (1),
30–36.
Zeithaml, V.A., 1988. Consumer perceptions of price, quality, and value: a means-end
model and synthesis of evidence. J. Mark. 52 (3), 2–22.
Zeithaml, V.A., Berry, L.L., Parasuraman, A., 1996. The behavioral consequences of
service quality. J. Mark. 60 (2), 31.
Zeithaml, V.A., Parasuraman, A., Malhotra, A., 2002. Service quality delivery through
web sites: a critical review of extant knowledge. J. Acad. Mark. Sci. 30 (4), 362–375.
P. Rita et al. Heliyon 5 (2019) e02690
14