正文

多变量回归模型成本不降低

 2023-03-12  63

关键词：

【中文标题】多变量回归模型成本不降低【英文标题】：Multivariable regression model cost does not decrease 【发布时间】：2020-05-03 21:06:00 【问题描述】：

我正在尝试实现一个多变量回归模型，其中均方误差作为成本函数，梯度下降来优化参数。超过 1000 次迭代，成本函数没有减少。我不确定我是否正确实现了渐变。另外，我怎样才能将偏见融入其中。我知道对于简单的线性模型，偏差是 y 截距，但我如何在这里实现它。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import datasets

class LinearRegression:
    def __init__(self, learning_rate=0.0001, n_iters=1000):
        self.lr = learning_rate
        self.n_iters = n_iters
        #since we have three independent variable, we initialize three weights with zeros
        self.weights = np.array([[0.0],[0.0],[0.0]])

    def update_param(self, x_featureset, y_targets, weights):
        """
        x_featureset - (160,3)
        y_targets - (160,1)
        predictions - (160,1)
        weights - (3,1)
        """
        predictions = self.predict(x_featureset, weights)

        #extract the features
        x1 = x_featureset[:,0]
        x2 = x_featureset[:,1]
        x3 = x_featureset[:,2]

        #calculate partial derivatives
        d_w1 = -x1*(y_targets - predictions)
        d_w2 = -x2*(y_targets - predictions)
        d_w3 = -x2*(y_targets - predictions)

        #multiply derivative by learning rate and subtract from our weights
        weights[0][0] -= (self.lr*np.mean(d_w1))
        weights[1][0] -= (self.lr*np.mean(d_w2))
        weights[2][0] -= (self.lr*np.mean(d_w3))

        return weights

    def cost_function(self, x_featureset, y_targets, weights):
        """
        x_featureset - (160,3)
        y_targets - (160,1)
        predictions - (160,1)
        weights - (3,1)
        """

        total_observation = len(y_targets)
        predictions = self.predict(x_featureset, weights)
        sq_error = (y_targets-predictions)**2
        return 1.0/(2*total_observation) * sq_error.sum()

    def normalize(self, x_featureset):
        """
        x_featureset - (160,3)
        x_featureset.T - (3,160)
        """
        for features in x_featureset.T:
            fmean = np.mean(features)
            frange = np.amax(features) - np.amin(features)

            #vector subtraction
            features -= fmean
            #vector division
            features /= frange

        return x_featureset

    def train(self, x, y):
        cost_history = []
        #nomalize independent variables
        x = self.normalize(x)
        for i in range(self.n_iters):
            self.weights = self.update_param(x, y, self.weights)
            cost = self.cost_function(x,y, self.weights)
            cost_history.append(cost)
            #log process
            if i % 10 == 0:
                print("cost: ".format(cost))

    def predict(self, x_featureset, weights):
        """
        featureset - (160,3)
        weights - (3,1)
        predictions - (160,1)
        """
        y_predicted = np.dot(x_featureset, weights)
        return y_predicted

#generating sample data using sklearn
def generate_data():
    x, y = datasets.make_regression(n_samples=200, n_features=3, noise=20, random_state=4)
    x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=1234)
    return (x_train, x_test, y_train, y_test)

#create model instance
model = LinearRegression()
x_train, x_test, y_train, y_test = generate_data()

#fit the data
model.train(x_train, y_train)

【问题讨论】：

【参考方案1】：

我建议遵循执行模型多元回归的代码

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import datasets

class LinearRegression:
    def __init__(self, learning_rate=0.0001, n_iters=1000):
        self.lr = learning_rate
        self.n_iters = n_iters
        #since we have three independent variable, we initialize 4 weights with zeros
        self.weights = np.array([[0.0],[0.0],[0.0],[0.0]])

    def update_param(self, x_featureset, y_targets, weights):
        """
        x_featureset - (160,3)
        y_targets - (160,1)
        predictions - (160,1)
        weights - (4,1)
        """
        predictions = self.predict(x_featureset, weights)

        #extract the features
        x1 = x_featureset[:,0]
        x2 = x_featureset[:,1]
        x3 = x_featureset[:,2]

        #calculate partial derivatives
        d_w0 = - (y_targets - predictions)
        d_w1 = -x1*(y_targets - predictions)
        d_w2 = -x2*(y_targets - predictions)
        d_w3 = -x3*(y_targets - predictions)

        #multiply derivative by learning rate and subtract from our weights
        weights[0][0] -= (self.lr * np.mean(d_w0))
        weights[1][0] -= (self.lr * np.mean(d_w1))
        weights[2][0] -= (self.lr *np.mean(d_w2))
        weights[3][0] -= (self.lr*np.mean(d_w3))

        return weights

    def cost_function(self, x_featureset, y_targets, weights):
        """
        x_featureset - (160,3)
        y_targets - (160,1)
        predictions - (160,1)
        weights - (4,1)
        """

        total_observation = len(y_targets)
        predictions = self.predict(x_featureset, weights)
        sq_error = (y_targets-predictions)**2
        return 1.0/(2*total_observation) * sq_error.sum()

    def normalize(self, x_featureset):
        """
        x_featureset - (160,3)
        x_featureset.T - (3,160)
        """
        for features in x_featureset.T:
            fmean = np.mean(features)
            frange = np.amax(features) - np.amin(features)

            #vector subtraction
            features -= fmean
            #vector division
            features /= frange

        return x_featureset

    def train(self, x, y):
        cost_history = []
        #nomalize independent variables
        x = self.normalize(x)
        for i in range(self.n_iters):
            self.weights = self.update_param(x, y, self.weights)
            cost = self.cost_function(x,y, self.weights)
            cost_history.append(cost)
            #log process
            if i % 10 == 0:
                print("cost: ".format(cost))

    def predict(self, x_featureset, weights):
        """
        featureset - (160,3)
        weights - (4,1)
        predictions - (160,1)
        """
        # Y = W0 + W1* X1 + W2 * X2 + W3 * X3 
        y_predicted = weights[0,:]+np.dot(x_featureset, weights[1:,:])
        return y_predicted

#generating sample data using sklearn
def generate_data():
    x, y = datasets.make_regression(n_samples=200, n_features=3, noise=20, random_state=4)
    x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=1234)
    return (x_train, x_test, y_train, y_test)

#create model instance
model = LinearRegression()
x_train, x_test, y_train, y_test = generate_data()

#fit the data
model.train(x_train, y_train)

输出：

cost: 980808.7969914433
cost: 980757.9150537294
cost: 980707.1372473323
cost: 980656.4633691043
cost: 980605.8932163038
cost: 980555.4265865949
cost: 980505.0632780452
cost: 980454.8030891262
cost: 980404.6458187121
cost: 980354.5912660785
cost: 980304.6392309029
cost: 980254.7895132622
cost: 980205.0419136335
cost: 980155.3962328921
cost: 980105.8522723109
cost: 980056.4098335612
cost: 980007.0687187086
cost: 979957.8287302161
cost: 979908.6896709399
cost: 979859.6513441313
cost: 979810.7135534338
cost: 979761.8761028836
cost: 979713.138796909
cost: 979664.5014403281
cost: 979615.9638383496
cost: 979567.5257965708
cost: 979519.1871209786
cost: 979470.9476179467
cost: 979422.8070942352
cost: 979374.7653569917
cost: 979326.8222137484
cost: 979278.9774724214
cost: 979231.2309413117
cost: 979183.5824291029
cost: 979136.031744861
cost: 979088.578698033
cost: 979041.2230984472
cost: 978993.9647563117
cost: 978946.8034822136
cost: 978899.7390871193
cost: 978852.7713823715
cost: 978805.9001796913
cost: 978759.1252911751
cost: 978712.4465292948
cost: 978665.8637068978
cost: 978619.3766372039
cost: 978572.9851338081
cost: 978526.689010676
cost: 978480.4880821462
cost: 978434.3821629275
cost: 978388.3710680995
cost: 978342.4546131112
cost: 978296.6326137795
cost: 978250.9048862904
cost: 978205.271247197
cost: 978159.7315134181
cost: 978114.2855022394
cost: 978068.9330313113
cost: 978023.6739186477
cost: 977978.5079826281
cost: 977933.4350419926
cost: 977888.4549158453
cost: 977843.5674236503
cost: 977798.7723852337
cost: 977754.0696207809
cost: 977709.4589508367
cost: 977664.9401963042
cost: 977620.5131784454
cost: 977576.177718878
cost: 977531.9336395771
cost: 977487.7807628732
cost: 977443.7189114518
cost: 977399.7479083528
cost: 977355.8675769694
cost: 977312.0777410483
cost: 977268.3782246873
cost: 977224.7688523371
cost: 977181.2494487979
cost: 977137.8198392204
cost: 977094.4798491052
cost: 977051.2293043006
cost: 977008.0680310033
cost: 976964.9958557582
cost: 976922.0126054548
cost: 976879.1181073303
cost: 976836.3121889662
cost: 976793.5946782889
cost: 976750.9654035685
cost: 976708.4241934177
cost: 976665.9708767924
cost: 976623.6052829901
cost: 976581.3272416494
cost: 976539.1365827485
cost: 976497.0331366067
cost: 976455.0167338816
cost: 976413.0872055686
cost: 976371.2443830017
cost: 976329.488097852
cost: 976287.8181821259
cost: 976246.2344681664

更新：

使用lr=0.001 进行测试，因为上述学习率太大并且迭代到100000。我发现该模型收敛于以下成本值。

cost: 959301.8925571552
cost: 959298.6367338672
cost: 959296.3380453996
cost: 959294.9824055596
cost: 959294.5560072181
cost: 959295.0453167808
cost: 959296.4370687702
cost: 959298.7182605114
cost: 959301.8761469286

【讨论】：

【参考方案2】：

首先，您的代码中存在表达式（语法）错误。

d_w1 = -x1*(y_targets - predictions)
d_w2 = -x2*(y_targets - predictions)
d_w3 = -x2*(y_targets - predictions)

应该是：

d_w1 = -x1*(y_targets - predictions)
d_w2 = -x2*(y_targets - predictions)
d_w3 = -x3*(y_targets - predictions)

现在这确实会导致成本降低一点。但我不认为这已经收敛到 Global Optimum。如果我可以进一步优化它，将审查和更新。输出：

cost: 980813.8909325758
cost: 980813.8924092407
cost: 980813.8963470139
cost: 980813.9027458953
cost: 980813.9116058851
cost: 980813.9229269831
cost: 980813.9367091894
cost: 980813.952952504
cost: 980813.9716569266
cost: 980813.9928224577
cost: 980814.0164490971
cost: 980814.0425368445
cost: 980814.0710857003
cost: 980814.1020956644
cost: 980814.1355667366
cost: 980814.171498917
cost: 980814.2098922059
cost: 980814.250746603
cost: 980814.2940621084
cost: 980814.3398387218
cost: 980814.3880764437
cost: 980814.4387752739
cost: 980814.4919352122
cost: 980814.5475562587
cost: 980814.6056384137
cost: 980814.6661816769
cost: 980814.729186048
cost: 980814.7946515278
cost: 980814.8625781157
cost: 980814.9329658119
cost: 980815.0058146162
cost: 980815.0811245289
cost: 980815.1588955498
cost: 980815.239127679
cost: 980815.3218209165
cost: 980815.4069752623
cost: 980815.4945907161
cost: 980815.5846672785
cost: 980815.6772049487
cost: 980815.7722037275
cost: 980815.8696636144
cost: 980815.9695846096
cost: 980816.0719667133
cost: 980816.1768099251
cost: 980816.284114245
cost: 980816.3938796733
cost: 980816.5061062098
cost: 980816.6207938545
cost: 980816.7379426076
cost: 980816.8575524688
cost: 980816.9796234384
cost: 980817.1041555163
cost: 980817.2311487021
cost: 980817.3606029968
cost: 980817.4925183991
cost: 980817.6268949099
cost: 980817.7637325292
cost: 980817.9030312565
cost: 980818.0447910922
cost: 980818.1890120357
cost: 980818.3356940881
cost: 980818.4848372485
cost: 980818.6364415172
cost: 980818.7905068938
cost: 980818.9470333789
cost: 980819.1060209726
cost: 980819.267469674
cost: 980819.431379484
cost: 980819.5977504023
cost: 980819.7665824285
cost: 980819.9378755633
cost: 980820.1116298061
cost: 980820.2878451576
cost: 980820.4665216169
cost: 980820.6476591846
cost: 980820.8312578606
cost: 980821.0173176448
cost: 980821.2058385374
cost: 980821.3968205382
cost: 980821.5902636473
cost: 980821.7861678643
cost: 980821.9845331898
cost: 980822.1853596235
cost: 980822.3886471657
cost: 980822.594395816
cost: 980822.8026055746
cost: 980823.0132764415
cost: 980823.2264084164
cost: 980823.4420014997
cost: 980823.6600556913
cost: 980823.880570991
cost: 980824.1035473992
cost: 980824.3289849155
cost: 980824.55688354
cost: 980824.787243273
cost: 980825.0200641142
cost: 980825.2553460634
cost: 980825.4930891211
cost: 980825.733293287
cost: 980825.9759585612

【讨论】：

感谢您的意见。为什么成本会这么高？是因为标准化问题吗？可能。在执行期间更新权重时，似乎没问题。由于这是线性回归模型，并且您正在使用最小二乘法进行优化，因此高成本可能是因为最佳拟合对训练数据的拟合不足（或不能很好地泛化）。必须先研究数据集才能说什么。这是有道理的。另外，为什么我们需要方程中的权重（W0）。你能解释一下这对模型有什么帮助吗？ W0 是添加到模型中的截距或偏差，以便它表示线性方程。 Y=MX+C 矩阵形式，本质上就是线性回归。

r使用lm构建多变量线性回归模型

R使用lm构建多变量线性回归模型多元回归是线性回归扩展到两个以上变量之间的回归（regression）关系。在简单的线性关系中，我们有一个预测变量和一个响应变量，但在多元回归中，我们有多个预测变量(大于等于2)和... 查看详情

机器学习——多变量线性回归

【一、多变量线性回归模型】多变量线性回归是指输入为多维特征的情况。比如：在上图中可看出房子的价格price由四个变量(size、numberofbedrooms、numberoffloors、ageofhome)决定。为了能够预測给定条件（四个变量）下的房子的价格（y... 查看详情

回归系数不显著怎么办

...可以看一下是不是自己哪一步错了，然后重新选择方程，变量，样本以及方法。下面是对这几个的详细介绍：1、选方程。同样的问题，有时会有不同的模型。某篇经典文献用的是A模型，另外一个大牛可能用的是B模型。倒底哪个... 查看详情

第04周-多变量线性回归

...就变成下面这个样子：进一步的简化，可以表示为：4.2多变量梯度下降首先，构建多变量线性回归的代价函数：那么我们的目标就成功的转化为了：求取使代价函数最小的一系列参数。那么多变量线性回归的批量梯度下... 查看详情

单因素统计和多因素回归分析有啥区别

...单因素分析（monofactoranalysis）是指在一个时间点上对某一变量的分析。2、多因素回归分析：指在相关变量中将一个变量视为因变量，其他一个或多个变量视为自变量，建立多个变量之间线性或非线性数学模型数量关系式并利用... 查看详情

如何用spss做多项logistic回归

...择弹出对话框中的数据。要具体看数据类型，如果Y（因变量）为定类数据，可用Logistic回归分析；如果Y为定量数据，可用多元回归分析，如果自变量中有定类数据可设置成哑变量，再放入分析。2、第二部就是点击工具栏上的分... 查看详情

如何在r语言中使用logistic回归模型

...如预测房价、身高、GDP、学生成绩等，发现这些被预测的变量都属于连续型变量。然而有些情况下，被预测变量可能是二元变量，即成功或失败、流失或不流失、涨或跌等，对于这类问题，线性回归将束手无策。这个时候就需要... 查看详情

如何在r语言中使用logistic回归模型

...型主要有三大用途：1）寻找危险因素，找到某些影响因变量的"坏因素"，一般可以通过优势比发现危险因素；2）用于预测，可以预测某种情况发生的概率或可能性大小；3）用于判别，判断某个新样本所属的类别。Logistic... 查看详情

为啥多元线性回归模型的自变量之间要求相互独立

参考技术A如果多元线性回归的自变量存在多重共线性（相互不独立），则会导致回归的结果不可靠，不真实。查看详情

逻辑回归算法-通俗易懂易实现

...现1.什么是逻辑回归在前面讲述的回归模型中，处理的因变量都是数值型区间变量，建立的模型描述是因变量的期望与自变量之间的线性关系。比如常见的线性回归模型：　　　　　　　　　　　　　　　　而在采用回归模... 查看详情

逻辑回归成本 = nan

】逻辑回归成本=nan【英文标题】：LogisticRegressionCost=nan【发布时间】：2020-05-0801:15:25【问题描述】：我正在尝试实现逻辑回归模型，但不断将“nan”值作为成本。我尝试了多个数据集，但结果相同。不同的来源给出了梯度下降的... 查看详情

统计学习方法五逻辑回归分类

...是界于0和1之间的概率；2）可以适用于连续性和类别性自变量；3）容易使用和解释；缺点：1）对模型中自变量多重共线性较为敏感，例如两个高度相关自变量同时放入模型，可能导致较弱的一个自变量回归符号不符合预期，符... 查看详情

怎样利用spss标定二项logit模型

...nalyse--regression--binarylogistic，打开二分回归对话框。2.将因变量和自变量放入格子的列表里，上面的是因变量，下面的是自变量（单变量拉入一个，多因素拉入多个）。3.设置回归方法，这里选择最简单的方法：enter，它指的是将... 查看详情

机器学习之线性回归选择题总结

...回归1、多元线性回归中的“线性”是指什么是线性的A.因变量B.系数C.因变量D.误差答案：A2、欠拟合的产生原因有A学习到数据的特征过少B学习到数据的特征过多C学习到错误数据D机器运算错误答案：A3、线性回归的核心是... 查看详情

famamacbath回归不显著

...归分析出现结果不显著情况可能有以下几种原因：一、自变量共线性在进行线性famamacbath回归分析时，很容易出现自变量共线性问题，通常情况下VIF值大于10说明严重共线，VIF大于5则说明有共线性问题。当出现共线性问题时，可... 查看详情

[machinelearning]多变量线性回归(linearregressionwithmultiplevariable)-特征缩放-正规方程

　　我们从上一篇博客中知道了关于单变量线性回归的相关问题，例如：什么是回归，什么是代价函数，什么是梯度下降法。　　本节我们讲一下多变量线性回归。依然拿房价来举例，现在我们对房价模型增加更多的特征，例如... 查看详情

多因素方差分析与回归分析有啥异同啊？

...床资料，但是始终有一点搞不懂，就是分析几种因素对因变量的影响显著性的时候，该选用方差分析还是回归分析啊？可以通用吗？1、分析对象不同回归分析（regressionanalysis)是确定两种或两种以上变量间相互依赖的定量关系的... 查看详情

关于多元线性回归模型的显著性检验

...通常的情况是，方程的总体线性关系是显著的，但是某个变量的影响却并不显著。因为，方程总体的线性关系显著性F检验的备择假设是估计参数不全为0，所以当某个参数的t检验通过（即拒绝零假设，参数不为0），则很可能影... 查看详情