"
],
"text/plain": [
" Survived Sex_male\n",
"0 0 1\n",
"1 1 0\n",
"2 1 0\n",
"3 1 0\n",
"4 0 1"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dt=dt.drop('Sex_female',axis=1)\n",
"dt.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hagamos la regresión logística:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on class LogisticRegression in module sklearn.linear_model.logistic:\n",
"\n",
"class LogisticRegression(sklearn.base.BaseEstimator, sklearn.linear_model.base.LinearClassifierMixin, sklearn.linear_model.base.SparseCoefMixin)\n",
" | LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='warn', max_iter=100, multi_class='warn', verbose=0, warm_start=False, n_jobs=None)\n",
" | \n",
" | Logistic Regression (aka logit, MaxEnt) classifier.\n",
" | \n",
" | In the multiclass case, the training algorithm uses the one-vs-rest (OvR)\n",
" | scheme if the 'multi_class' option is set to 'ovr', and uses the cross-\n",
" | entropy loss if the 'multi_class' option is set to 'multinomial'.\n",
" | (Currently the 'multinomial' option is supported only by the 'lbfgs',\n",
" | 'sag' and 'newton-cg' solvers.)\n",
" | \n",
" | This class implements regularized logistic regression using the\n",
" | 'liblinear' library, 'newton-cg', 'sag' and 'lbfgs' solvers. It can handle\n",
" | both dense and sparse input. Use C-ordered arrays or CSR matrices\n",
" | containing 64-bit floats for optimal performance; any other input format\n",
" | will be converted (and copied).\n",
" | \n",
" | The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization\n",
" | with primal formulation. The 'liblinear' solver supports both L1 and L2\n",
" | regularization, with a dual formulation only for the L2 penalty.\n",
" | \n",
" | Read more in the :ref:`User Guide `.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | penalty : str, 'l1' or 'l2', default: 'l2'\n",
" | Used to specify the norm used in the penalization. The 'newton-cg',\n",
" | 'sag' and 'lbfgs' solvers support only l2 penalties.\n",
" | \n",
" | .. versionadded:: 0.19\n",
" | l1 penalty with SAGA solver (allowing 'multinomial' + L1)\n",
" | \n",
" | dual : bool, default: False\n",
" | Dual or primal formulation. Dual formulation is only implemented for\n",
" | l2 penalty with liblinear solver. Prefer dual=False when\n",
" | n_samples > n_features.\n",
" | \n",
" | tol : float, default: 1e-4\n",
" | Tolerance for stopping criteria.\n",
" | \n",
" | C : float, default: 1.0\n",
" | Inverse of regularization strength; must be a positive float.\n",
" | Like in support vector machines, smaller values specify stronger\n",
" | regularization.\n",
" | \n",
" | fit_intercept : bool, default: True\n",
" | Specifies if a constant (a.k.a. bias or intercept) should be\n",
" | added to the decision function.\n",
" | \n",
" | intercept_scaling : float, default 1.\n",
" | Useful only when the solver 'liblinear' is used\n",
" | and self.fit_intercept is set to True. In this case, x becomes\n",
" | [x, self.intercept_scaling],\n",
" | i.e. a \"synthetic\" feature with constant value equal to\n",
" | intercept_scaling is appended to the instance vector.\n",
" | The intercept becomes ``intercept_scaling * synthetic_feature_weight``.\n",
" | \n",
" | Note! the synthetic feature weight is subject to l1/l2 regularization\n",
" | as all other features.\n",
" | To lessen the effect of regularization on synthetic feature weight\n",
" | (and therefore on the intercept) intercept_scaling has to be increased.\n",
" | \n",
" | class_weight : dict or 'balanced', default: None\n",
" | Weights associated with classes in the form ``{class_label: weight}``.\n",
" | If not given, all classes are supposed to have weight one.\n",
" | \n",
" | The \"balanced\" mode uses the values of y to automatically adjust\n",
" | weights inversely proportional to class frequencies in the input data\n",
" | as ``n_samples / (n_classes * np.bincount(y))``.\n",
" | \n",
" | Note that these weights will be multiplied with sample_weight (passed\n",
" | through the fit method) if sample_weight is specified.\n",
" | \n",
" | .. versionadded:: 0.17\n",
" | *class_weight='balanced'*\n",
" | \n",
" | random_state : int, RandomState instance or None, optional, default: None\n",
" | The seed of the pseudo random number generator to use when shuffling\n",
" | the data. If int, random_state is the seed used by the random number\n",
" | generator; If RandomState instance, random_state is the random number\n",
" | generator; If None, the random number generator is the RandomState\n",
" | instance used by `np.random`. Used when ``solver`` == 'sag' or\n",
" | 'liblinear'.\n",
" | \n",
" | solver : str, {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, default: 'liblinear'.\n",
" | \n",
" | Algorithm to use in the optimization problem.\n",
" | \n",
" | - For small datasets, 'liblinear' is a good choice, whereas 'sag' and\n",
" | 'saga' are faster for large ones.\n",
" | - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'\n",
" | handle multinomial loss; 'liblinear' is limited to one-versus-rest\n",
" | schemes.\n",
" | - 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas\n",
" | 'liblinear' and 'saga' handle L1 penalty.\n",
" | \n",
" | Note that 'sag' and 'saga' fast convergence is only guaranteed on\n",
" | features with approximately the same scale. You can\n",
" | preprocess the data with a scaler from sklearn.preprocessing.\n",
" | \n",
" | .. versionadded:: 0.17\n",
" | Stochastic Average Gradient descent solver.\n",
" | .. versionadded:: 0.19\n",
" | SAGA solver.\n",
" | .. versionchanged:: 0.20\n",
" | Default will change from 'liblinear' to 'lbfgs' in 0.22.\n",
" | \n",
" | max_iter : int, default: 100\n",
" | Useful only for the newton-cg, sag and lbfgs solvers.\n",
" | Maximum number of iterations taken for the solvers to converge.\n",
" | \n",
" | multi_class : str, {'ovr', 'multinomial', 'auto'}, default: 'ovr'\n",
" | If the option chosen is 'ovr', then a binary problem is fit for each\n",
" | label. For 'multinomial' the loss minimised is the multinomial loss fit\n",
" | across the entire probability distribution, *even when the data is\n",
" | binary*. 'multinomial' is unavailable when solver='liblinear'.\n",
" | 'auto' selects 'ovr' if the data is binary, or if solver='liblinear',\n",
" | and otherwise selects 'multinomial'.\n",
" | \n",
" | .. versionadded:: 0.18\n",
" | Stochastic Average Gradient descent solver for 'multinomial' case.\n",
" | .. versionchanged:: 0.20\n",
" | Default will change from 'ovr' to 'auto' in 0.22.\n",
" | \n",
" | verbose : int, default: 0\n",
" | For the liblinear and lbfgs solvers set verbose to any positive\n",
" | number for verbosity.\n",
" | \n",
" | warm_start : bool, default: False\n",
" | When set to True, reuse the solution of the previous call to fit as\n",
" | initialization, otherwise, just erase the previous solution.\n",
" | Useless for liblinear solver. See :term:`the Glossary `.\n",
" | \n",
" | .. versionadded:: 0.17\n",
" | *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.\n",
" | \n",
" | n_jobs : int or None, optional (default=None)\n",
" | Number of CPU cores used when parallelizing over classes if\n",
" | multi_class='ovr'\". This parameter is ignored when the ``solver`` is\n",
" | set to 'liblinear' regardless of whether 'multi_class' is specified or\n",
" | not. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`\n",
" | context. ``-1`` means using all processors.\n",
" | See :term:`Glossary ` for more details.\n",
" | \n",
" | Attributes\n",
" | ----------\n",
" | \n",
" | classes_ : array, shape (n_classes, )\n",
" | A list of class labels known to the classifier.\n",
" | \n",
" | coef_ : array, shape (1, n_features) or (n_classes, n_features)\n",
" | Coefficient of the features in the decision function.\n",
" | \n",
" | `coef_` is of shape (1, n_features) when the given problem is binary.\n",
" | In particular, when `multi_class='multinomial'`, `coef_` corresponds\n",
" | to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False).\n",
" | \n",
" | intercept_ : array, shape (1,) or (n_classes,)\n",
" | Intercept (a.k.a. bias) added to the decision function.\n",
" | \n",
" | If `fit_intercept` is set to False, the intercept is set to zero.\n",
" | `intercept_` is of shape (1,) when the given problem is binary.\n",
" | In particular, when `multi_class='multinomial'`, `intercept_`\n",
" | corresponds to outcome 1 (True) and `-intercept_` corresponds to\n",
" | outcome 0 (False).\n",
" | \n",
" | n_iter_ : array, shape (n_classes,) or (1, )\n",
" | Actual number of iterations for all classes. If binary or multinomial,\n",
" | it returns only 1 element. For liblinear solver, only the maximum\n",
" | number of iteration across all classes is given.\n",
" | \n",
" | .. versionchanged:: 0.20\n",
" | \n",
" | In SciPy <= 1.0.0 the number of lbfgs iterations may exceed\n",
" | ``max_iter``. ``n_iter_`` will now report at most ``max_iter``.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> from sklearn.datasets import load_iris\n",
" | >>> from sklearn.linear_model import LogisticRegression\n",
" | >>> X, y = load_iris(return_X_y=True)\n",
" | >>> clf = LogisticRegression(random_state=0, solver='lbfgs',\n",
" | ... multi_class='multinomial').fit(X, y)\n",
" | >>> clf.predict(X[:2, :])\n",
" | array([0, 0])\n",
" | >>> clf.predict_proba(X[:2, :]) # doctest: +ELLIPSIS\n",
" | array([[9.8...e-01, 1.8...e-02, 1.4...e-08],\n",
" | [9.7...e-01, 2.8...e-02, ...e-08]])\n",
" | >>> clf.score(X, y)\n",
" | 0.97...\n",
" | \n",
" | See also\n",
" | --------\n",
" | SGDClassifier : incrementally trained logistic regression (when given\n",
" | the parameter ``loss=\"log\"``).\n",
" | LogisticRegressionCV : Logistic regression with built-in cross validation\n",
" | \n",
" | Notes\n",
" | -----\n",
" | The underlying C implementation uses a random number generator to\n",
" | select features when fitting the model. It is thus not uncommon,\n",
" | to have slightly different results for the same input data. If\n",
" | that happens, try with a smaller tol parameter.\n",
" | \n",
" | Predict output may not match that of standalone liblinear in certain\n",
" | cases. See :ref:`differences from liblinear `\n",
" | in the narrative documentation.\n",
" | \n",
" | References\n",
" | ----------\n",
" | \n",
" | LIBLINEAR -- A Library for Large Linear Classification\n",
" | https://www.csie.ntu.edu.tw/~cjlin/liblinear/\n",
" | \n",
" | SAG -- Mark Schmidt, Nicolas Le Roux, and Francis Bach\n",
" | Minimizing Finite Sums with the Stochastic Average Gradient\n",
" | https://hal.inria.fr/hal-00860051/document\n",
" | \n",
" | SAGA -- Defazio, A., Bach F. & Lacoste-Julien S. (2014).\n",
" | SAGA: A Fast Incremental Gradient Method With Support\n",
" | for Non-Strongly Convex Composite Objectives\n",
" | https://arxiv.org/abs/1407.0202\n",
" | \n",
" | Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent\n",
" | methods for logistic regression and maximum entropy models.\n",
" | Machine Learning 85(1-2):41-75.\n",
" | https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf\n",
" | \n",
" | Method resolution order:\n",
" | LogisticRegression\n",
" | sklearn.base.BaseEstimator\n",
" | sklearn.linear_model.base.LinearClassifierMixin\n",
" | sklearn.base.ClassifierMixin\n",
" | sklearn.linear_model.base.SparseCoefMixin\n",
" | builtins.object\n",
" | \n",
" | Methods defined here:\n",
" | \n",
" | __init__(self, penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='warn', max_iter=100, multi_class='warn', verbose=0, warm_start=False, n_jobs=None)\n",
" | Initialize self. See help(type(self)) for accurate signature.\n",
" | \n",
" | fit(self, X, y, sample_weight=None)\n",
" | Fit the model according to the given training data.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : {array-like, sparse matrix}, shape (n_samples, n_features)\n",
" | Training vector, where n_samples is the number of samples and\n",
" | n_features is the number of features.\n",
" | \n",
" | y : array-like, shape (n_samples,)\n",
" | Target vector relative to X.\n",
" | \n",
" | sample_weight : array-like, shape (n_samples,) optional\n",
" | Array of weights that are assigned to individual samples.\n",
" | If not provided, then each sample is given unit weight.\n",
" | \n",
" | .. versionadded:: 0.17\n",
" | *sample_weight* support to LogisticRegression.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | self : object\n",
" | \n",
" | predict_log_proba(self, X)\n",
" | Log of probability estimates.\n",
" | \n",
" | The returned estimates for all classes are ordered by the\n",
" | label of classes.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array-like, shape = [n_samples, n_features]\n",
" | \n",
" | Returns\n",
" | -------\n",
" | T : array-like, shape = [n_samples, n_classes]\n",
" | Returns the log-probability of the sample for each class in the\n",
" | model, where classes are ordered as they are in ``self.classes_``.\n",
" | \n",
" | predict_proba(self, X)\n",
" | Probability estimates.\n",
" | \n",
" | The returned estimates for all classes are ordered by the\n",
" | label of classes.\n",
" | \n",
" | For a multi_class problem, if multi_class is set to be \"multinomial\"\n",
" | the softmax function is used to find the predicted probability of\n",
" | each class.\n",
" | Else use a one-vs-rest approach, i.e calculate the probability\n",
" | of each class assuming it to be positive using the logistic function.\n",
" | and normalize these values across all the classes.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array-like, shape = [n_samples, n_features]\n",
" | \n",
" | Returns\n",
" | -------\n",
" | T : array-like, shape = [n_samples, n_classes]\n",
" | Returns the probability of the sample for each class in the model,\n",
" | where classes are ordered as they are in ``self.classes_``.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from sklearn.base.BaseEstimator:\n",
" | \n",
" | __getstate__(self)\n",
" | \n",
" | __repr__(self)\n",
" | Return repr(self).\n",
" | \n",
" | __setstate__(self, state)\n",
" | \n",
" | get_params(self, deep=True)\n",
" | Get parameters for this estimator.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | deep : boolean, optional\n",
" | If True, will return the parameters for this estimator and\n",
" | contained subobjects that are estimators.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | params : mapping of string to any\n",
" | Parameter names mapped to their values.\n",
" | \n",
" | set_params(self, **params)\n",
" | Set the parameters of this estimator.\n",
" | \n",
" | The method works on simple estimators as well as on nested objects\n",
" | (such as pipelines). The latter have parameters of the form\n",
" | ``__`` so that it's possible to update each\n",
" | component of a nested object.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | self\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors inherited from sklearn.base.BaseEstimator:\n",
" | \n",
" | __dict__\n",
" | dictionary for instance variables (if defined)\n",
" | \n",
" | __weakref__\n",
" | list of weak references to the object (if defined)\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from sklearn.linear_model.base.LinearClassifierMixin:\n",
" | \n",
" | decision_function(self, X)\n",
" | Predict confidence scores for samples.\n",
" | \n",
" | The confidence score for a sample is the signed distance of that\n",
" | sample to the hyperplane.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array_like or sparse matrix, shape (n_samples, n_features)\n",
" | Samples.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)\n",
" | Confidence scores per (sample, class) combination. In the binary\n",
" | case, confidence score for self.classes_[1] where >0 means this\n",
" | class would be predicted.\n",
" | \n",
" | predict(self, X)\n",
" | Predict class labels for samples in X.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array_like or sparse matrix, shape (n_samples, n_features)\n",
" | Samples.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | C : array, shape [n_samples]\n",
" | Predicted class label per sample.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from sklearn.base.ClassifierMixin:\n",
" | \n",
" | score(self, X, y, sample_weight=None)\n",
" | Returns the mean accuracy on the given test data and labels.\n",
" | \n",
" | In multi-label classification, this is the subset accuracy\n",
" | which is a harsh metric since you require for each sample that\n",
" | each label set be correctly predicted.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array-like, shape = (n_samples, n_features)\n",
" | Test samples.\n",
" | \n",
" | y : array-like, shape = (n_samples) or (n_samples, n_outputs)\n",
" | True labels for X.\n",
" | \n",
" | sample_weight : array-like, shape = [n_samples], optional\n",
" | Sample weights.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | score : float\n",
" | Mean accuracy of self.predict(X) wrt. y.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from sklearn.linear_model.base.SparseCoefMixin:\n",
" | \n",
" | densify(self)\n",
" | Convert coefficient matrix to dense array format.\n",
" | \n",
" | Converts the ``coef_`` member (back) to a numpy.ndarray. This is the\n",
" | default format of ``coef_`` and is required for fitting, so calling\n",
" | this method is only required on models that have previously been\n",
" | sparsified; otherwise, it is a no-op.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | self : estimator\n",
" | \n",
" | sparsify(self)\n",
" | Convert coefficient matrix to sparse format.\n",
" | \n",
" | Converts the ``coef_`` member to a scipy.sparse matrix, which for\n",
" | L1-regularized models can be much more memory- and storage-efficient\n",
" | than the usual numpy.ndarray representation.\n",
" | \n",
" | The ``intercept_`` member is not converted.\n",
" | \n",
" | Notes\n",
" | -----\n",
" | For non-sparse models, i.e. when there are not many zeros in ``coef_``,\n",
" | this may actually *increase* memory usage, so use this method with\n",
" | care. A rule of thumb is that the number of zero elements, which can\n",
" | be computed with ``(coef_ == 0).sum()``, must be more than 50% for this\n",
" | to provide significant benefits.\n",
" | \n",
" | After calling this method, further fitting with the partial_fit\n",
" | method (if any) will not work until you call densify.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | self : estimator\n",
"\n"
]
}
],
"source": [
"help(LogisticRegression)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"# Definimos el modelo\n",
"logreg=LogisticRegression(random_state=0, solver='lbfgs') #Se puede fijar un solver"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'C': 1.0,\n",
" 'class_weight': None,\n",
" 'dual': False,\n",
" 'fit_intercept': True,\n",
" 'intercept_scaling': 1,\n",
" 'max_iter': 100,\n",
" 'multi_class': 'warn',\n",
" 'n_jobs': None,\n",
" 'penalty': 'l2',\n",
" 'random_state': 0,\n",
" 'solver': 'lbfgs',\n",
" 'tol': 0.0001,\n",
" 'verbose': 0,\n",
" 'warm_start': False}"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# parámetros del modelo\n",
"logreg.get_params()"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"# Definimos X e y\n",
"X = dt.drop('Survived',axis=1) # Para que me de un dataframe y no una serie\n",
"y=dt['Survived']"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"# Ajustamos\n",
"logreg = logreg.fit(X,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Coeficientes del modelo: $\\beta$ = `coef_`, $\\alpha$ = `intercept_`.\n",
"\n",
"$$p=\\frac{1}{1+e^{-(\\alpha+\\beta x)}}$$"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"alpha: [1.01628767]\n",
"beta: [[-2.44597988]]\n"
]
}
],
"source": [
"print('alpha: ', logreg.intercept_)\n",
"print('beta: ', logreg.coef_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Odds de la mujer: $e^\\alpha$."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2.76291884])"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.exp(logreg.intercept_)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.876543209876542"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"odds_mujer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Odds del hombre: $e^{\\alpha+\\beta}$"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.23938259]])"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.exp(logreg.intercept_+logreg.coef_)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.23290598290598288"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"odds_hombre"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Odds ratio: $e^\\beta$"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.0866412]])"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.exp(logreg.coef_)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.08096731594585674"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"odds_ratio"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"# Predecimos\n",
"y_pred = logreg.predict(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hagamos la confusion matrix:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[468, 109],\n",
" [ 81, 233]])"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"confusion_matrix(y_pred,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La precisión del modelo la podemos medir con su accuracy. Se puede obtener con `.score`, que nos evalúa unos datos sobre un modelo y calcula el error o con `accuracy_score` que calcula la accuracy entre un vector de `y` reales otro de predichas"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7867564534231201"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logreg.score(X,y)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7867564534231201"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import accuracy_score\n",
"accuracy_score(y,y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hagamos otro ejemplo con más variables:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"