# 美国留学生关于Python机器学习自动梯度计算等的作业与答案 ## ML in a nutshell

Optimization, and machine learning, are intimately connected. At a very coarse level, ML works as follows.

First, you come up somehow with a very complicated model 𝑦̂ =𝑀(𝑥,𝜃)y^=M(x,θ), which computes an output 𝑦̂ y^ as a function of an input 𝑥x and of a vector of parameters 𝜃θ. In general, 𝑥x𝑦y, and 𝜃θ are vectors, as the model has multiple inputs, multiple outputs, and several parameters. The model 𝑀M needs to be complicated, because only complicated models can represent complicated phenomena; for instance, 𝑀M can be a multi-layer neural net with parameters 𝜃=[𝜃1,,𝜃𝑘]θ=[θ1,…,θk], where 𝑘k is the number of parameters of the model.

Second, you come up with a notion of loss 𝐿L, that is, how badly the model is doing. For instance, if you have a list of inputs 𝑥1,,𝑥𝑛x1,…,xn, and a set of desired outputs 𝑦1,,𝑦𝑚y1,…,ym, you can use as loss:

𝐿(𝜃)=𝑖=1𝑛||𝑦𝑖𝑦̂ 𝑖||=𝑖=1𝑛||𝑦𝑖𝑀(𝑥𝑖,𝜃)||.L(θ)=∑i=1n||yi−y^i||=∑i=1n||yi−M(xi,θ)||.

Here, we wrote 𝐿(𝜃)L(θ) because, once the inputs 𝑥1,,𝑥𝑛x1,…,xn and the desired outputs 𝑦1,,𝑦𝑛y1,…,yn are chosen, the loss 𝐿L depends only on 𝜃θ.

Once the loss is chosen, you decrease it, by computing its gradient with respect to 𝜃θ. Remembering that 𝜃=[𝜃1,,𝜃𝑘]θ=[θ1,…,θk],

𝜃𝐿=[𝐿𝜃1,,𝐿𝜃𝑘].∇θL=[∂L∂θ1,…,∂L∂θk].

The gradient is a vector that indicates how to tweak 𝜃θ to decrease the loss. You then choose a small step size 𝛿δ, and you update 𝜃θ via 𝜃:=𝜃𝛿𝜃𝐿θ:=θ−δ∇θL. This makes the loss a little bit smaller, and the model a little bit better. If you repeat this step many times, the model will hopefully get (a good bit) better.

The key to pleasant ML is to focus on building the model 𝑀M in a way that is sufficiently expressive, and on choosing a loss 𝐿L that is helpful in guiding the optimization. The computation of the gradient is done automatically for you. This capability, called autogradient, is implemented in ML frameworks such as TensorflowKeras, and PyTorch.

It is possible to use these advanced ML libraries without ever knowing what is under the hood, and how autogradient works. Here, we will insted dive in, and implement autogradient.

Building a model 𝑀M corresponds to building an expression with inputs 𝑥x𝜃θ. We will provide a representaton for expressions that enables both the calculation of the expression value, and the differentiation with respect to any of the inputs. This will enable us to implement autogradient. On the basis of this, we will be able to implement a simple ML framework.

We say we, but we mean you. You will implement it; we will just provide guidance.

Question 1 With these clarifications, we ask you to implement the compute_gradient method, which again must:
Question 2: Rounding up the implementation
Question 3: Implementation of the fit function  #### 学本领，探索更大的世界！

error: Content is protected !!