What is the gradient algorithm
Linear regression, also known as linear regression, is the regression represented by a straight line as opposed to curved regression. When the dependent variable Y is against the independent variable X.1、 X2... If the regression equation of Xm is a linear equation, namely μy＝ β0 + β1X1 + β2X2 +… ΒmXm, Where β0Is a constant term, βiIs the independent variable X.iThe regression coefficient of M is any natural number. Then call Y against X.1、 X2、… 、 XmThe regression is a linear regression.
Linear regression with only one independent variable is called simple regression, as shown in the following example:
X stands for the quantity of a certain product, Y for the total price of these various quantities of goods
x = [0, 1, 2, 3, 4, 5]
y = [0, 17, 45, 55, 85, 99]
The drawing in two-dimensional coordinates is as follows:
Now if the number of goods X = 6, what is the total estimated price of the goods?
We can clearly see that the total price of goods increases as the number of goods increases, which is a typical linear regression.
Since there is only one independent variable X, we assume a linear regression model: Y = a * X + b
We need to find the most appropriate values for a and b so that the straight line: Y = a * X + b fits the trend in the figure above, and then we can predict the total price Y among different quantities of goods X.
Least Squares Method:
To find the most suitable a b, we introduce the least squares method.
Least Squares Method, also known as Least Squares Estimation. A common method for estimating population parameters from sample observations. It is used to observe data from n pairs (x)1， Y1) ， (X2， Y2), ..., (xn, yn) determine the best estimate of the correspondence between x and y, y = f (x) such that the sum of the squares H is the difference (i.e., deviation) between the observed value and the estimated value the smallest is.
Least squares method can eliminate the influence of random errors as much as possible to get the most reliable and likely result from a set of observational data.
From the above figure we can clearly see that the straight line Y = a * X + b goes through the origin, i.e. b = 0
We tried different values of a and got the following results:
when a = 19 H = 154
H = 85 when a = 20
when a = 21, H = 126
The pictures are as follows:
We can roughly conclude that when a = 20 and b = 0, the linear model Y = 20 * X fits the sample data better.
So if the quantity of goods is X = 6, we can roughly estimate the total price Y = 20 * 6 = 120
Linear regression with more than one independent variable is called multiple regression.
The above example is just one independent variable, it is relatively easy to handle, but when there are many independent variables we assume that there are m independent variables, like [x1, x2, x3, x4.....xm]
At this point, the regression coefficients (i.e. weights) we assumed must also have m, i.e. the linear model we assumed is Y = X.0 + X1* W1 + X2* W2 + X3* W3 + ....... + Xm* Wm
To simplify the calculation, we go to W.0 = 1
In this way: Y = X.0* W0 + X1* W1 + X2* W2 + X3* W3 + ....... + Xm* Wm
Written in vector form:
W = [W0, W1 , W.2 ,W.3 , ...., Wm]
X = [X0, X1 , X2 , X3 , ...., Xm]
Y = WT * X （WTIs the transpose of the vector W)
The sum of the squares of the difference (i.e., deviation) between the observed value and the estimated value:
To make the calculation easier later, we multiply the left side of H by half, namely:
In the above formula, n stands for the number of training patterns, m for the number of features (independent variables) of each training pattern.The subscript means that it belongs to the jth sample, and the subscript means the ith characteristic (value of the independent variable).Represents the observation value of the jth total sample price
Now, H is a function of W0, W1, W2 ... Wm. We need to find the most suitable value of W by some suitable method in order to get a better linear regression equation. Compared to simple regression, it is difficult for us to observe and try out different values of w. We have to use an optimization algorithm.
Common optimization algorithms include Gradient Descent, Newton's Method and Quasi-Newton's Methods, Conjugate Gradient and Heuristic Optimization Methods. In this article, the gradient algorithm is presented in detail. .
Clarify our current goal: we have to find out through the gradient algorithm --- if H is the smallest, W.0 , W1 ,W.2 ,W.3 , ......., WmWrite the regression equation.
The gradient algorithm is divided into gradient ascent algorithm and gradient descent algorithm. The basic idea of the gradient descent algorithm is: in order to find the minimum value of a function, it is best to search along the gradient direction of the function, and the opposite is true for gradient ascent. For a function f (x, y) with two unknowns x, y the gradient is expressed as:
For Z = f (x, y) using the gradient descent algorithm means moving along the X-axisMove towards Y.The function f (x, y) must be defined and differentiable at the point to be calculated.
It can be understood as:
Use gradient descent to find the minimum H.
We saw earlier:
H is approximately W = [W.0 , W1 ,W.2 ,W.3 , ......., Wm] Function is the gradient of H as follows:
At this point for each Wi gradient:
We assume that the step size of each update along the gradient direction is α, so the value update formula of W can be written as:
So the pseudocode of the gradient descent algorithm is as follows:
Each value of each regression coefficient (i.e. each W value) is 1
Repeat R times:
Calculate the gradient of the entire data set
Use Update the regression coefficient W.
Use the gradient descent algorithm to find the linear regression equation of the following commodity data
We assume that the linear regression model is the total price Y = a + b * X1 + c * X2 (X1 X2 represent the quantity of goods 1 or 2)
We have to ask for the regression coefficient W = [a, b, c].
The gradient descent algorithm is as follows:
The regression coefficient we get with the above algorithm is
use Update regression coefficients W.
The revised algorithm is as follows:
The calculated regression coefficient is:
We can get the linear regression equation as:
Y = 1.27 + 4.31 * X1 + 5.28 * X2
The full code for this article has been uploaded: https: //gitee.com/beiyan/machine_learning/tree/master/gradient
The stochastic gradient descent algorithm (slope algorithm) is widely used, and the effect is very good. The following article uses the gradient algorithm to solve some problems. Without exception, the gradient algorithm also has shortcomings, e.g. For example, the speed of convergence slows down when it is close to the minimum value, some problems can arise when searching in a straight line, and it can "zigzag" off. In addition, the choice of the step size for decreasing or increasing will also be made. By influencing the final regression coefficient, we can test the effect of the regression by changing some parameters.
Reprinted at: https://www.cnblogs.com/beiyan/p/8404817.html
- Why is cola a popular drink
- When and how was Microsoft founded
- When did the newspapers go online?
- Which class do you teach 1
- Area 51 will be ambushed
- Will the community ever be available on Netflix?
- Would you marry an entrepreneur
- Crucify Saudi Arabia people
- Why is school now all about grades?
- How is Civil Engineering at UPES College
- Why is it called Programs x86
- Is that a false statement
- Refined sugar is a toxin
- What do most people oversimplify?
- Doctoral programs are complicated
- Why are Coldplay concert tickets expensive
- Has Bruce Lee ever tried muay thai
- Does Argentina still need a person
- Which is better uwsgi or gunicorn
- Can you handle karma
- Is life more painful than death
- Is adrenachrome real
- How do I improve my writing Thank you
- Oranges are still ripe after harvest