Some basic concepts in the context of testing of hypotheses are explained below -
1) Null Hypotheses and Alternative Hypotheses: In the context of statistical analysis, we often talk about null and alternative hypotheses. If we are to compare the superiority of method A with that of method B and we proceed on the assumption that both methods are equally good, then this assumption is termed as a null hypothesis. On the other hand, if we think that method A is superior, then it is known as an alternative hypothesis.
These are symbolically represented as:
Null hypothesis = H0 and Alternative hypothesis = Ha
Suppose we want to test the hypothesis that the population mean is equal to the hypothesized mean (µ H0) = 100. Then we would say that the null hypothesis is that the population mean is equal to the hypothesized mean 100 and symbolically we can express it as: H0: µ= µ H0=100
If our sample results do not support this null hypothesis, we should conclude that something else is true. What we conclude rejecting the null hypothesis is known as an alternative hypothesis. If we accept H0, then we are rejecting Ha and if we reject H0, then we are accepting Ha. For H0: µ= µ H0=100, we may consider three possible alternative hypotheses as follows:
Alternative Hypotheses To be read as follows
Ha: µ≠µ H0 (The alternative hypothesis is that the population mean is not equal to 100 i.e., it may be more or less 100)
Ha: µ>µ H0 (The alternative hypothesis is that the population mean is greater than 100)
Ha: µ< µ H0 (The alternative hypothesis is that the population mean is less than 100)
The null hypotheses and the alternative hypotheses are chosen before the sample is drawn (the researcher must avoid the error of deriving hypotheses from the data he collects and testing the hypotheses from the same data). In the choice of null hypothesis, the following considerations are usually kept in view:
a. The alternative hypothesis is usually the one, which is to be proved, and the null hypothesis is the one that is to be disproved. Thus a null hypothesis represents the hypothesis we are trying to reject, while the alternative hypothesis represents all other possibilities.
b. If the rejection of a certain hypothesis when it is actually true involves great risk, it is taken as null hypothesis, because then the probability of rejecting it when it is true is α (the level of significance) which is chosen very small.
c. The null hypothesis should always be a specific hypothesis i.e., it should not state an approximate value.
Generally, in hypothesis testing, we proceed on the basis of the null hypothesis, keeping the alternative hypothesis in view. Why so? The answer is that on the assumption that the null hypothesis is true, one can assign the probabilities to different possible sample results, but this cannot be done if we proceed with alternative hypotheses. Hence the use of null hypotheses (at times also known as statistical hypotheses) is quite frequent.
2) The Level of Significance: This is a very important concept in the context of hypothesis testing. It is always some percentage (usually 5%), which should be chosen with great care, thought and reason. In case we take the significance level at 5%, then this implies that H0 will be rejected when the sampling result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0 is true. In other words, the 5% level of significance means that the researcher is willing to take as much as 5% risk rejecting the null hypothesis when it (H0) happens to be true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis.
3) Decision Rule or Test of Hypotheses: Given a hypothesis Ha and an alternative hypothesis H0, we make a rule, which is known as a decision rule, according to which we accept H0 (i.e., reject Ha) or reject H0 (i.e., accept Ha). For instance, if H0 is that a certain lot is good (there are very few defective items in it), against Ha, that the lot is not good (there are many defective items in it), then we must decide the number of items to be tested and the criterion for accepting or rejecting the hypothesis. We might test 10 items in the lot and plan our decision saying that if there are none or only 1 defective item among the 10, we will accept H0; otherwise we will reject H0 (or accept Ha). This sort of basis is known as a decision rule.
4) Type I & II Errors: In the context of testing of hypotheses, there are basically two types of errors that we can make. We may reject H0 when H0 is true and we may accept H0 when it is not true. The former is known as Type I and the latter is known as Type II. In other words, Type I error means rejection of hypotheses, which should have been accepted, and Type II error means accepting of hypotheses, which should have been rejected. Type I error is denoted by α (alpha), also called as level of significance of test; and Type II error is denoted by β(beta).
Decision
Accept H0 Reject H0
H0 (true) Correct decision Type I error (α error)
Ho (false) Type II error (β error) Correct decision
The probability of Type I error is usually determined in advance and is understood as the level of significance of testing the hypotheses. If type I error is fixed at 5%, it means there are about 5 chances in 100 that we will reject H0 when H0 is true. We can control type I error just by fixing it at a lower level. For instance, if we fix it at 1%, we will say that the maximum probability of committing type I error would only be 0.01.
But with a fixed sample size n, when we try to reduce type I error, the probability of committing type II error increases. Both types of errors cannot be reduced simultaneously, since there is a trade-off in business situations. Decision makers decide the appropriate level of type I error by examining the costs of penalties attached to both types of errors. If type I error involves time and trouble of reworking a batch of chemicals that should have been accepted, whereas type II error means taking a chance that an entire group of users of this chemicals compound will be poisoned, then in such a situation one should prefer a type I error to a type II error. As a result, one must set a very high level for type I error in one’s testing techniques of a given hypothesis. Hence, in testing of hypotheses, one must make all possible efforts to strike an adequate balance between Type I & Type II error.
5) Two Tailed Test & One Tailed Test: In the context of hypothesis testing, these two terms are quite important and must be clearly understood. A two-tailed test rejects the null hypothesis if, say, the sample mean is significantly higher or lower than the hypothesized value of the mean of the population. Such a test is inappropriate when we have H0: µ= µ H0 and Ha: µ≠µ H0 which may µ>µ H0 or µ<µ H0. If significance level is 5 % and the two-tailed test is to be applied, the probability of the rejection area will be 0.05 (equally split on both tails of the curve as 0.025) and that of the acceptance region will be 0.95. If we take µ = 100 and if our sample mean deviates significantly from µ, in that case we shall accept the null hypothesis. But there are situations when only a one-tailed test is considered appropriate. A one-tailed test would be used when we are to test, say, whether the population mean is either lower or higher than some hypothesized value.
Parametric statistics is a branch of statistics that assumes data come from a type of probability distribution and makes inferences about the parameters of the distribution most well known elementary statistical methods are parametric.
Generally speaking parametric methods make more assumptions than non-parametric methods. If those extra assumptions are correct, parametric methods can produce more accurate and precise estimates. They are said to have more statistical power. However, if those assumptions are incorrect, parametric methods can be very misleading. For that reason they are often not considered robust. On the other hand, parametric formulae are often simpler to write down and faster to compute. In some, but definitely not all cases, their simplicity makes up for their non-robustness, especially if care is taken to examine diagnostic statistics.
Because parametric statistics require a probability distribution, they are not distribution-free.
Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.
Kernel density estimation provides better estimates of the density than histograms.
Nonparametric regression and semi parametric regression methods have been developed based on kernels, splines, and wavelets.
Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate Analysis without any distributional assumption.