# 美国留学生关于Python机器学习自动梯度计算等的作业与答案 ## ML in a nutshell

Optimization, and machine learning, are intimately connected. At a very coarse level, ML works as follows.

First, you come up somehow with a very complicated model ?̂ =?(?,?)y^=M(x,θ), which computes an output ?̂ y^ as a function of an input ?x and of a vector of parameters ?θ. In general, ?x?y, and ?θ are vectors, as the model has multiple inputs, multiple outputs, and several parameters. The model ?M needs to be complicated, because only complicated models can represent complicated phenomena; for instance, ?M can be a multi-layer neural net with parameters ?=[?1,,??]θ=[θ1,…,θk], where ?k is the number of parameters of the model.

Second, you come up with a notion of loss ?L, that is, how badly the model is doing. For instance, if you have a list of inputs ?1,,??x1,…,xn, and a set of desired outputs ?1,,??y1,…,ym, you can use as loss:

?(?)=?=1?||???̂ ?||=?=1?||???(??,?)||.L(θ)=∑i=1n||yi−y^i||=∑i=1n||yi−M(xi,θ)||.

Here, we wrote ?(?)L(θ) because, once the inputs ?1,,??x1,…,xn and the desired outputs ?1,,??y1,…,yn are chosen, the loss ?L depends only on ?θ.

Once the loss is chosen, you decrease it, by computing its gradient with respect to ?θ. Remembering that ?=[?1,,??]θ=[θ1,…,θk],

??=[??1,,???].∇θL=[∂L∂θ1,…,∂L∂θk].

The gradient is a vector that indicates how to tweak ?θ to decrease the loss. You then choose a small step size ?δ, and you update ?θ via ?:=????θ:=θ−δ∇θL. This makes the loss a little bit smaller, and the model a little bit better. If you repeat this step many times, the model will hopefully get (a good bit) better.

The key to pleasant ML is to focus on building the model ?M in a way that is sufficiently expressive, and on choosing a loss ?L that is helpful in guiding the optimization. The computation of the gradient is done automatically for you. This capability, called autogradient, is implemented in ML frameworks such as TensorflowKeras, and PyTorch.

It is possible to use these advanced ML libraries without ever knowing what is under the hood, and how autogradient works. Here, we will insted dive in, and implement autogradient.

Building a model ?M corresponds to building an expression with inputs ?x?θ. We will provide a representaton for expressions that enables both the calculation of the expression value, and the differentiation with respect to any of the inputs. This will enable us to implement autogradient. On the basis of this, we will be able to implement a simple ML framework.

We say we, but we mean you. You will implement it; we will just provide guidance.

Question 1 With these clarifications, we ask you to implement the compute_gradient method, which again must:
Question 2: Rounding up the implementation
Question 3: Implementation of the fit function  