Financial Planning & Analytics

TUSHAR KANTSíntesis7 de Octubre de 2015

4.041 Palabras (17 Páginas)158 Visitas

Página 1 de 17

MULTIPLE REGRESSION

INTRODUCTION: The simple linear regression model is used to determine the effect of a single independent variable on the dependent variable.

[pic 1]

However, we are often interested in testing whether a dependent variable (y) is related to more than one independent variable (x2 , x3, x4, …) and there by the emphasis is on the estimation and inferences. The above said process is known as multiple regression and indeed, this is commonly done. However, it is possible that the independent variables could observe each other’s effects. For example, a choice of a restaurant can depend on factors like cost, convenience and ambience. The cost effect might over rid the convenience effect, leading to a regression for cost which would not appear very interesting.

One possible solution is to perform a regression with one independent variable, and then test whether a second independent variable is significant with respect to the residuals from this regression. Then the second variable is included. Continue in a similar manner to include other independent variables.

A multiple regression allows the simultaneous testing and modeling of multiple independent variables.

The model for a multiple regression (called the POPULATION REGRESSION MODEL) takes the form:

[pic 2]

Suppose in the above model, if the population size is ‘m’ (>n) the values of Y1, Y2,…,Ym could be explained as follows. By observing the equations we could see that there are common parameters which are likely to be bound that, these parameters are β1, β2,…,βp. Then, the ‘p’ dimensional representative plane looks like

[pic 3]

For any observation the representative plane is used to deduce the observation.

e.g. For ith observation assume that ith observation = 70 is observed, the representative plane provides value 68, then (70 – 68) provides the error term [pic 4]i = 2, suppose jth observation = 80, the representative plan provides value of 76 then (80 – 76) is the error term [pic 5]j = 4.

Here, there are ‘m’ observations in the population and ‘p’ variables (Y, X2, X3,…,Xp), [pic 6]i is the residue attached to each observation. (Residue is the difference between the actual Y value and predicted y value(y^ ) from the model).

Thus the system of equations are:

[pic 7]

The above set of equations can be reduced to a matrix form as shown below:

[pic 8]

In the X matrix, xji corresponds to jth variable and ith observation.

But in reality we don’t know the population size, then collect the sample observation and estimate the parameters of the population from the sample observation, and then estimated parameters constitutes the sample representative plan.

i.e. Our aim is to estimate the values of β1, β2,…,βp by obtaining data from a sample of the population. Thus, the sample regression equation is as under.

[pic 9]

Above concept is portrayed through a diagram as shown below for one independent variable.[pic 10]

[pic 11]

y^ i is the expected value of yi at xi [E (y/xi)] in the population regression model. Y^ i is the expected value of yi at xi in the sample regression model.

Our entire work is to calculate the values of b1, b2, b3, . . . , bp from the sample drawn out of the population and then estimate the partial regression co-efficients (β1,β2,β3, …,βp) of the population. Here; b1 is the estimator of β1, b2 is the estimator of β2 soon on bp for βp.

If there are 2 variables (1 dependent and 1 independent ) we will be fitting a line, for 3 variables ( 1 dependent and 2 independent) we will be fitting a plane and 4 variables ( 1 dependent and 3 independent) a space.

Multiple regression techniques have a wide application right from prediction of a Tornado to evaluation of a restaurant to have the next meal.

Assumptions: The multiple regression model operates under certain assumptions. These are :-

The error term is independent of each of the independent variables. i.e. the covariance cov (ε, Xi) = 0.

Since, the regression equation is y = f(x) + ε, and our aim is to

Minimize the error i.e. ε2 = [Y – f(x)]2. This becomes difficult if the

Independent term influences the error term.

The errors for all possible sets of given values of x2, x3, x4, …, xp are normally distributed.
The expected values of the errors is zero for all possible sets of given values of x2, x3, x4,…

i.e. E (εi) = 0

The variance of the errors is finite and is the same for all sets of given values of x2,x3,…i.e. variance (εi) = σ2 is a constant.
Any two errors are independent i.e. one error is not the cause if another.
The model should include only those variables which do not have relationships among themselves. i.e., no multi collinearity. If there is a muticollinearity then one independent variable could be estimated by another independent variable as it, like

X2 = K1 + K2X3.

TYPES OF REGRESSION FOR TYPES OF DATA: Different types of regressions exist for different types of data.

Dependent Variable	Independent Variable	Name of the regression
Metric	Metric	Ordinary regression
Non-metric	Non metric	Logistic regression
Metric	(Metric and Non metric)Or non metric	Dummy regression

Analysis of Multiple Regression Technique: The foremost task is to find the sample representative regression model. The model is constituted by the values of b. The most commonly used method of arriving at the estimators is using the method of ordinary least squares. The method of ordinary least square is explained as follows:

[pic 12]

[pic 13]

Here all the terms become scalar quantities. Hence, sum of squared error (e’e) should be partially differentiated by b and equate it to zero to estimate the value of ‘b’.

[pic 14]

The above equation forms the basis of formulating the best ‘p’ dimensional plane. Let us see how the above equation enables to find the lin e with two variables (y, x2). For ‘n’ observations, therefore,

[pic 15]

Solving the above, we can obtain the values of b1 and b2. They are the estimators of β1 and β2 respectively.

Following is the formulation of the ordinary least square method for three variables. (1 dependent y, two independent variables-x2, x3). For ‘n’ observations

[pic 16]

This could be similarly extended to p variables consisting of 1 dependent variable and (p-1) independent variables. When more number of variables are included in the model, it will be cumbersome to solve the equations. So it is better to adopt matrix methods, for that many softwares are available like matlab, scilab. In this book, matlab 6.1 is used to arrive at the estimators.

...

Descargar como (para miembros actualizados) txt (27 Kb) pdf (2 Mb) docx (1 Mb)

Leer 16 páginas más »

Leer documento completo Guardar

Disponible sólo en Clubensayos.com