{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# INTRODUCCIÓN A MACHINE LEARNING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introducción. ¿Qué es un modelo?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Por ejemplo supongamos que nos contrata una empresa inmobiliaria.\n",
"\n",
"A la empresa le interesa predecir el precio de mercado de un inmueble.\n",
"\n",
"¿Y cómo lo hacían antes de contratarte? Ellos te dirán que por \"intuición\". Realmente lo que hacen es identificar patrones en función de casas que hayan visto venderse en el pasado.\n",
"\n",
"Nosotros esencialmente haremos esto de forma automática y refinada.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Por ejemplo, podemos hacer un árbol de decisión (**Decision Tree**).\n",
"\n",
"¿La casa tiene más de 2 dormitorios?\n",
" + SÍ. Precio predicho: 188000 €.\n",
" + NO. Precio predicho: 178000 €."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Por ejemplo, los precios predichos los podemos haber obtenido a partir de la media de precios de cada categoría.\n",
"La idea de este paso se conoce como _ajuste_ o _entrenamiento_ de los datos. (**fitting**, **training**)\n",
"\n",
"Usamos **training data** para **fit** al modelo."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Se pueden añadir más variables:\n",
"¿La casa tiene más de 2 dormitorios?\n",
"+ SÍ. ¿Tiene más de 150 m^2?\n",
" + SÍ. Precio predicho: 233000 €\n",
" + NO. Precio predicho: 188000 €\n",
"+ NO. ¿Tiene más de 100 m^2?\n",
" + SÍ. Precio predicho: 170000 €\n",
" + NO. Precio predicho: 146000 €.\n",
" \n",
"Los bloques de final del árbol, donde hacemos las predicciones de precios, se conocen como las _hojas_ del árbol (**leaves**)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exploración de los datos usando `pandas`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"casas = pd.read_csv('melb_data.csv')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Suburb \n",
" Address \n",
" Rooms \n",
" Type \n",
" Price \n",
" Method \n",
" SellerG \n",
" Date \n",
" Distance \n",
" Postcode \n",
" ... \n",
" Bathroom \n",
" Car \n",
" Landsize \n",
" BuildingArea \n",
" YearBuilt \n",
" CouncilArea \n",
" Lattitude \n",
" Longtitude \n",
" Regionname \n",
" Propertycount \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Abbotsford \n",
" 85 Turner St \n",
" 2 \n",
" h \n",
" 1480000.0 \n",
" S \n",
" Biggin \n",
" 3/12/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 1.0 \n",
" 1.0 \n",
" 202.0 \n",
" NaN \n",
" NaN \n",
" Yarra \n",
" -37.7996 \n",
" 144.9984 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 1 \n",
" Abbotsford \n",
" 25 Bloomburg St \n",
" 2 \n",
" h \n",
" 1035000.0 \n",
" S \n",
" Biggin \n",
" 4/02/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 1.0 \n",
" 0.0 \n",
" 156.0 \n",
" 79.0 \n",
" 1900.0 \n",
" Yarra \n",
" -37.8079 \n",
" 144.9934 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 2 \n",
" Abbotsford \n",
" 5 Charles St \n",
" 3 \n",
" h \n",
" 1465000.0 \n",
" SP \n",
" Biggin \n",
" 4/03/2017 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 2.0 \n",
" 0.0 \n",
" 134.0 \n",
" 150.0 \n",
" 1900.0 \n",
" Yarra \n",
" -37.8093 \n",
" 144.9944 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 3 \n",
" Abbotsford \n",
" 40 Federation La \n",
" 3 \n",
" h \n",
" 850000.0 \n",
" PI \n",
" Biggin \n",
" 4/03/2017 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 2.0 \n",
" 1.0 \n",
" 94.0 \n",
" NaN \n",
" NaN \n",
" Yarra \n",
" -37.7969 \n",
" 144.9969 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 4 \n",
" Abbotsford \n",
" 55a Park St \n",
" 4 \n",
" h \n",
" 1600000.0 \n",
" VB \n",
" Nelson \n",
" 4/06/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 1.0 \n",
" 2.0 \n",
" 120.0 \n",
" 142.0 \n",
" 2014.0 \n",
" Yarra \n",
" -37.8072 \n",
" 144.9941 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 21 columns
\n",
"
"
],
"text/plain": [
" Suburb Address Rooms Type Price Method SellerG \\\n",
"0 Abbotsford 85 Turner St 2 h 1480000.0 S Biggin \n",
"1 Abbotsford 25 Bloomburg St 2 h 1035000.0 S Biggin \n",
"2 Abbotsford 5 Charles St 3 h 1465000.0 SP Biggin \n",
"3 Abbotsford 40 Federation La 3 h 850000.0 PI Biggin \n",
"4 Abbotsford 55a Park St 4 h 1600000.0 VB Nelson \n",
"\n",
" Date Distance Postcode ... Bathroom Car Landsize BuildingArea \\\n",
"0 3/12/2016 2.5 3067.0 ... 1.0 1.0 202.0 NaN \n",
"1 4/02/2016 2.5 3067.0 ... 1.0 0.0 156.0 79.0 \n",
"2 4/03/2017 2.5 3067.0 ... 2.0 0.0 134.0 150.0 \n",
"3 4/03/2017 2.5 3067.0 ... 2.0 1.0 94.0 NaN \n",
"4 4/06/2016 2.5 3067.0 ... 1.0 2.0 120.0 142.0 \n",
"\n",
" YearBuilt CouncilArea Lattitude Longtitude Regionname \\\n",
"0 NaN Yarra -37.7996 144.9984 Northern Metropolitan \n",
"1 1900.0 Yarra -37.8079 144.9934 Northern Metropolitan \n",
"2 1900.0 Yarra -37.8093 144.9944 Northern Metropolitan \n",
"3 NaN Yarra -37.7969 144.9969 Northern Metropolitan \n",
"4 2014.0 Yarra -37.8072 144.9941 Northern Metropolitan \n",
"\n",
" Propertycount \n",
"0 4019.0 \n",
"1 4019.0 \n",
"2 4019.0 \n",
"3 4019.0 \n",
"4 4019.0 \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Primero vemos todas las columnas:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',\n",
" 'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',\n",
" 'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',\n",
" 'Longtitude', 'Regionname', 'Propertycount'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Podemos hacer el procesamiento típico que hemos hecho en otras ocasiones:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def is_number(s):\n",
" try:\n",
" float(s)\n",
" return True\n",
" except ValueError:\n",
" return False\n",
" \n",
"def clasifica_columnas(datos):\n",
" numericas=[]\n",
" categoricas=[]\n",
" for i in datos.columns:\n",
" if is_number(datos.loc[0,i]):\n",
" numericas.append(i)\n",
" else:\n",
" categoricas.append(i)\n",
" return numericas, categoricas\n",
"\n",
"def procesamiento(i,datos):\n",
" if i in numericas:\n",
" print('Missings: ',datos.shape[0]-sum(datos[i].apply(is_number)))\n",
" print('NaNs: ', sum(datos[i].apply(np.isnan)))\n",
" print('Ceros: ', sum(datos[i].isin(datos[i]==0)))\n",
" print('Tipo: ', type(datos.loc[0,i]))\n",
" elif i in categoricas:\n",
" print(datos[i].unique())\n",
" ax = sns.countplot(datos[i])\n",
" ax = ax.set_xticklabels(ax.get_xticklabels(),rotation=90) #Para que rote los títulos"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"numericas, categoricas = clasifica_columnas(casas)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Rooms\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n",
"Price\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n",
"Distance\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 6\n",
"Tipo: \n",
"\n",
"\n",
"Postcode\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n",
"Bedroom2\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 707\n",
"Tipo: \n",
"\n",
"\n",
"Bathroom\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 7546\n",
"Tipo: \n",
"\n",
"\n",
"Car\n",
"Missings: 0\n",
"NaNs: 62\n",
"Ceros: 6535\n",
"Tipo: \n",
"\n",
"\n",
"Landsize\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 1941\n",
"Tipo: \n",
"\n",
"\n",
"BuildingArea\n",
"Missings: 0\n",
"NaNs: 6450\n",
"Ceros: 28\n",
"Tipo: \n",
"\n",
"\n",
"YearBuilt\n",
"Missings: 0\n",
"NaNs: 5375\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n",
"Lattitude\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n",
"Longtitude\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n",
"Propertycount\n",
"Missings: 0\n",
"NaNs: 0\n",
"Ceros: 0\n",
"Tipo: \n",
"\n",
"\n"
]
}
],
"source": [
"for i in numericas:\n",
" print(i)\n",
" procesamiento(i,casas)\n",
" print('\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Afortunadamente no hay missings. ¿A qué pueden deberse los ceros y los NaNs?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2.0\n",
"1 2.0\n",
"2 3.0\n",
"3 3.0\n",
"4 3.0\n",
"Name: Bedroom2, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas['Bedroom2'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Los NaNs pueden deberse a que falte el dato. Pero por ejemplo, los ceros en `Bedroom2` probablemente se deban a que no haya segunda habitación."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"¿Cuál es el tamaño medio de una casa? ¿Cuántos años tiene la casa más nueva?"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Rooms \n",
" Price \n",
" Distance \n",
" Postcode \n",
" Bedroom2 \n",
" Bathroom \n",
" Car \n",
" Landsize \n",
" BuildingArea \n",
" YearBuilt \n",
" Lattitude \n",
" Longtitude \n",
" Propertycount \n",
" \n",
" \n",
" \n",
" \n",
" count \n",
" 13580.000000 \n",
" 1.358000e+04 \n",
" 13580.000000 \n",
" 13580.000000 \n",
" 13580.000000 \n",
" 13580.000000 \n",
" 13518.000000 \n",
" 13580.000000 \n",
" 7130.000000 \n",
" 8205.000000 \n",
" 13580.000000 \n",
" 13580.000000 \n",
" 13580.000000 \n",
" \n",
" \n",
" mean \n",
" 2.937997 \n",
" 1.075684e+06 \n",
" 10.137776 \n",
" 3105.301915 \n",
" 2.914728 \n",
" 1.534242 \n",
" 1.610075 \n",
" 558.416127 \n",
" 151.967650 \n",
" 1964.684217 \n",
" -37.809203 \n",
" 144.995216 \n",
" 7454.417378 \n",
" \n",
" \n",
" std \n",
" 0.955748 \n",
" 6.393107e+05 \n",
" 5.868725 \n",
" 90.676964 \n",
" 0.965921 \n",
" 0.691712 \n",
" 0.962634 \n",
" 3990.669241 \n",
" 541.014538 \n",
" 37.273762 \n",
" 0.079260 \n",
" 0.103916 \n",
" 4378.581772 \n",
" \n",
" \n",
" min \n",
" 1.000000 \n",
" 8.500000e+04 \n",
" 0.000000 \n",
" 3000.000000 \n",
" 0.000000 \n",
" 0.000000 \n",
" 0.000000 \n",
" 0.000000 \n",
" 0.000000 \n",
" 1196.000000 \n",
" -38.182550 \n",
" 144.431810 \n",
" 249.000000 \n",
" \n",
" \n",
" 25% \n",
" 2.000000 \n",
" 6.500000e+05 \n",
" 6.100000 \n",
" 3044.000000 \n",
" 2.000000 \n",
" 1.000000 \n",
" 1.000000 \n",
" 177.000000 \n",
" 93.000000 \n",
" 1940.000000 \n",
" -37.856822 \n",
" 144.929600 \n",
" 4380.000000 \n",
" \n",
" \n",
" 50% \n",
" 3.000000 \n",
" 9.030000e+05 \n",
" 9.200000 \n",
" 3084.000000 \n",
" 3.000000 \n",
" 1.000000 \n",
" 2.000000 \n",
" 440.000000 \n",
" 126.000000 \n",
" 1970.000000 \n",
" -37.802355 \n",
" 145.000100 \n",
" 6555.000000 \n",
" \n",
" \n",
" 75% \n",
" 3.000000 \n",
" 1.330000e+06 \n",
" 13.000000 \n",
" 3148.000000 \n",
" 3.000000 \n",
" 2.000000 \n",
" 2.000000 \n",
" 651.000000 \n",
" 174.000000 \n",
" 1999.000000 \n",
" -37.756400 \n",
" 145.058305 \n",
" 10331.000000 \n",
" \n",
" \n",
" max \n",
" 10.000000 \n",
" 9.000000e+06 \n",
" 48.100000 \n",
" 3977.000000 \n",
" 20.000000 \n",
" 8.000000 \n",
" 10.000000 \n",
" 433014.000000 \n",
" 44515.000000 \n",
" 2018.000000 \n",
" -37.408530 \n",
" 145.526350 \n",
" 21650.000000 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Rooms Price Distance Postcode Bedroom2 \\\n",
"count 13580.000000 1.358000e+04 13580.000000 13580.000000 13580.000000 \n",
"mean 2.937997 1.075684e+06 10.137776 3105.301915 2.914728 \n",
"std 0.955748 6.393107e+05 5.868725 90.676964 0.965921 \n",
"min 1.000000 8.500000e+04 0.000000 3000.000000 0.000000 \n",
"25% 2.000000 6.500000e+05 6.100000 3044.000000 2.000000 \n",
"50% 3.000000 9.030000e+05 9.200000 3084.000000 3.000000 \n",
"75% 3.000000 1.330000e+06 13.000000 3148.000000 3.000000 \n",
"max 10.000000 9.000000e+06 48.100000 3977.000000 20.000000 \n",
"\n",
" Bathroom Car Landsize BuildingArea YearBuilt \\\n",
"count 13580.000000 13518.000000 13580.000000 7130.000000 8205.000000 \n",
"mean 1.534242 1.610075 558.416127 151.967650 1964.684217 \n",
"std 0.691712 0.962634 3990.669241 541.014538 37.273762 \n",
"min 0.000000 0.000000 0.000000 0.000000 1196.000000 \n",
"25% 1.000000 1.000000 177.000000 93.000000 1940.000000 \n",
"50% 1.000000 2.000000 440.000000 126.000000 1970.000000 \n",
"75% 2.000000 2.000000 651.000000 174.000000 1999.000000 \n",
"max 8.000000 10.000000 433014.000000 44515.000000 2018.000000 \n",
"\n",
" Lattitude Longtitude Propertycount \n",
"count 13580.000000 13580.000000 13580.000000 \n",
"mean -37.809203 144.995216 7454.417378 \n",
"std 0.079260 0.103916 4378.581772 \n",
"min -38.182550 144.431810 249.000000 \n",
"25% -37.856822 144.929600 4380.000000 \n",
"50% -37.802355 145.000100 6555.000000 \n",
"75% -37.756400 145.058305 10331.000000 \n",
"max -37.408530 145.526350 21650.000000 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Selección de los datos para modelizar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Antes de nada, vamos a quitarnos los NaN."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"casas=casas.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Suburb \n",
" Address \n",
" Rooms \n",
" Type \n",
" Price \n",
" Method \n",
" SellerG \n",
" Date \n",
" Distance \n",
" Postcode \n",
" ... \n",
" Bathroom \n",
" Car \n",
" Landsize \n",
" BuildingArea \n",
" YearBuilt \n",
" CouncilArea \n",
" Lattitude \n",
" Longtitude \n",
" Regionname \n",
" Propertycount \n",
" \n",
" \n",
" \n",
" \n",
" 1 \n",
" Abbotsford \n",
" 25 Bloomburg St \n",
" 2 \n",
" h \n",
" 1035000.0 \n",
" S \n",
" Biggin \n",
" 4/02/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 1.0 \n",
" 0.0 \n",
" 156.0 \n",
" 79.0 \n",
" 1900.0 \n",
" Yarra \n",
" -37.8079 \n",
" 144.9934 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 2 \n",
" Abbotsford \n",
" 5 Charles St \n",
" 3 \n",
" h \n",
" 1465000.0 \n",
" SP \n",
" Biggin \n",
" 4/03/2017 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 2.0 \n",
" 0.0 \n",
" 134.0 \n",
" 150.0 \n",
" 1900.0 \n",
" Yarra \n",
" -37.8093 \n",
" 144.9944 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 4 \n",
" Abbotsford \n",
" 55a Park St \n",
" 4 \n",
" h \n",
" 1600000.0 \n",
" VB \n",
" Nelson \n",
" 4/06/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 1.0 \n",
" 2.0 \n",
" 120.0 \n",
" 142.0 \n",
" 2014.0 \n",
" Yarra \n",
" -37.8072 \n",
" 144.9941 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 6 \n",
" Abbotsford \n",
" 124 Yarra St \n",
" 3 \n",
" h \n",
" 1876000.0 \n",
" S \n",
" Nelson \n",
" 7/05/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 2.0 \n",
" 0.0 \n",
" 245.0 \n",
" 210.0 \n",
" 1910.0 \n",
" Yarra \n",
" -37.8024 \n",
" 144.9993 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
" 7 \n",
" Abbotsford \n",
" 98 Charles St \n",
" 2 \n",
" h \n",
" 1636000.0 \n",
" S \n",
" Nelson \n",
" 8/10/2016 \n",
" 2.5 \n",
" 3067.0 \n",
" ... \n",
" 1.0 \n",
" 2.0 \n",
" 256.0 \n",
" 107.0 \n",
" 1890.0 \n",
" Yarra \n",
" -37.8060 \n",
" 144.9954 \n",
" Northern Metropolitan \n",
" 4019.0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 21 columns
\n",
"
"
],
"text/plain": [
" Suburb Address Rooms Type Price Method SellerG \\\n",
"1 Abbotsford 25 Bloomburg St 2 h 1035000.0 S Biggin \n",
"2 Abbotsford 5 Charles St 3 h 1465000.0 SP Biggin \n",
"4 Abbotsford 55a Park St 4 h 1600000.0 VB Nelson \n",
"6 Abbotsford 124 Yarra St 3 h 1876000.0 S Nelson \n",
"7 Abbotsford 98 Charles St 2 h 1636000.0 S Nelson \n",
"\n",
" Date Distance Postcode ... Bathroom Car Landsize BuildingArea \\\n",
"1 4/02/2016 2.5 3067.0 ... 1.0 0.0 156.0 79.0 \n",
"2 4/03/2017 2.5 3067.0 ... 2.0 0.0 134.0 150.0 \n",
"4 4/06/2016 2.5 3067.0 ... 1.0 2.0 120.0 142.0 \n",
"6 7/05/2016 2.5 3067.0 ... 2.0 0.0 245.0 210.0 \n",
"7 8/10/2016 2.5 3067.0 ... 1.0 2.0 256.0 107.0 \n",
"\n",
" YearBuilt CouncilArea Lattitude Longtitude Regionname \\\n",
"1 1900.0 Yarra -37.8079 144.9934 Northern Metropolitan \n",
"2 1900.0 Yarra -37.8093 144.9944 Northern Metropolitan \n",
"4 2014.0 Yarra -37.8072 144.9941 Northern Metropolitan \n",
"6 1910.0 Yarra -37.8024 144.9993 Northern Metropolitan \n",
"7 1890.0 Yarra -37.8060 144.9954 Northern Metropolitan \n",
"\n",
" Propertycount \n",
"1 4019.0 \n",
"2 4019.0 \n",
"4 4019.0 \n",
"6 4019.0 \n",
"7 4019.0 \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Primero vamos a elegir la variable de la que queremos predecir sus valores, el _objetivo de predicción_ (**prediction target**). Generalmente esta variable se suele denotar por $y$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"En este caso $y$ va a ser el precio:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"y=casas.Price"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Las columnas que usamos para hacer predicciones de $y$ son aquellas que consideramos que pueden ser relevantes para el precio. Se llaman _características_ (**features**) y sus datos se denotan por $X$. "
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# Seleccionamos unas pocas columnas como features\n",
"features=['Rooms','Bathroom','BuildingArea','YearBuilt','Landsize','Lattitude','Longtitude']\n",
"X = casas[features]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Rooms \n",
" Bathroom \n",
" BuildingArea \n",
" YearBuilt \n",
" Landsize \n",
" Lattitude \n",
" Longtitude \n",
" \n",
" \n",
" \n",
" \n",
" 1 \n",
" 2 \n",
" 1.0 \n",
" 79.0 \n",
" 1900.0 \n",
" 156.0 \n",
" -37.8079 \n",
" 144.9934 \n",
" \n",
" \n",
" 2 \n",
" 3 \n",
" 2.0 \n",
" 150.0 \n",
" 1900.0 \n",
" 134.0 \n",
" -37.8093 \n",
" 144.9944 \n",
" \n",
" \n",
" 4 \n",
" 4 \n",
" 1.0 \n",
" 142.0 \n",
" 2014.0 \n",
" 120.0 \n",
" -37.8072 \n",
" 144.9941 \n",
" \n",
" \n",
" 6 \n",
" 3 \n",
" 2.0 \n",
" 210.0 \n",
" 1910.0 \n",
" 245.0 \n",
" -37.8024 \n",
" 144.9993 \n",
" \n",
" \n",
" 7 \n",
" 2 \n",
" 1.0 \n",
" 107.0 \n",
" 1890.0 \n",
" 256.0 \n",
" -37.8060 \n",
" 144.9954 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Rooms Bathroom BuildingArea YearBuilt Landsize Lattitude Longtitude\n",
"1 2 1.0 79.0 1900.0 156.0 -37.8079 144.9934\n",
"2 3 2.0 150.0 1900.0 134.0 -37.8093 144.9944\n",
"4 4 1.0 142.0 2014.0 120.0 -37.8072 144.9941\n",
"6 3 2.0 210.0 1910.0 245.0 -37.8024 144.9993\n",
"7 2 1.0 107.0 1890.0 256.0 -37.8060 144.9954"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.head()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Rooms \n",
" Bathroom \n",
" BuildingArea \n",
" YearBuilt \n",
" Landsize \n",
" Lattitude \n",
" Longtitude \n",
" \n",
" \n",
" \n",
" \n",
" count \n",
" 6196.000000 \n",
" 6196.000000 \n",
" 6196.000000 \n",
" 6196.000000 \n",
" 6196.000000 \n",
" 6196.000000 \n",
" 6196.000000 \n",
" \n",
" \n",
" mean \n",
" 2.931407 \n",
" 1.576340 \n",
" 141.568645 \n",
" 1964.081988 \n",
" 471.006940 \n",
" -37.807904 \n",
" 144.990201 \n",
" \n",
" \n",
" std \n",
" 0.971079 \n",
" 0.711362 \n",
" 90.834824 \n",
" 38.105673 \n",
" 897.449881 \n",
" 0.075850 \n",
" 0.099165 \n",
" \n",
" \n",
" min \n",
" 1.000000 \n",
" 1.000000 \n",
" 0.000000 \n",
" 1196.000000 \n",
" 0.000000 \n",
" -38.164920 \n",
" 144.542370 \n",
" \n",
" \n",
" 25% \n",
" 2.000000 \n",
" 1.000000 \n",
" 91.000000 \n",
" 1940.000000 \n",
" 152.000000 \n",
" -37.855438 \n",
" 144.926198 \n",
" \n",
" \n",
" 50% \n",
" 3.000000 \n",
" 1.000000 \n",
" 124.000000 \n",
" 1970.000000 \n",
" 373.000000 \n",
" -37.802250 \n",
" 144.995800 \n",
" \n",
" \n",
" 75% \n",
" 4.000000 \n",
" 2.000000 \n",
" 170.000000 \n",
" 2000.000000 \n",
" 628.000000 \n",
" -37.758200 \n",
" 145.052700 \n",
" \n",
" \n",
" max \n",
" 8.000000 \n",
" 8.000000 \n",
" 3112.000000 \n",
" 2018.000000 \n",
" 37000.000000 \n",
" -37.457090 \n",
" 145.526350 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Rooms Bathroom BuildingArea YearBuilt Landsize \\\n",
"count 6196.000000 6196.000000 6196.000000 6196.000000 6196.000000 \n",
"mean 2.931407 1.576340 141.568645 1964.081988 471.006940 \n",
"std 0.971079 0.711362 90.834824 38.105673 897.449881 \n",
"min 1.000000 1.000000 0.000000 1196.000000 0.000000 \n",
"25% 2.000000 1.000000 91.000000 1940.000000 152.000000 \n",
"50% 3.000000 1.000000 124.000000 1970.000000 373.000000 \n",
"75% 4.000000 2.000000 170.000000 2000.000000 628.000000 \n",
"max 8.000000 8.000000 3112.000000 2018.000000 37000.000000 \n",
"\n",
" Lattitude Longtitude \n",
"count 6196.000000 6196.000000 \n",
"mean -37.807904 144.990201 \n",
"std 0.075850 0.099165 \n",
"min -38.164920 144.542370 \n",
"25% -37.855438 144.926198 \n",
"50% -37.802250 144.995800 \n",
"75% -37.758200 145.052700 \n",
"max -37.457090 145.526350 "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Construcción del modelo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para constuir el modelo usaremos la librería **scikit-learn** (`sklearn`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Los pasos para hacer un modelo son:\n",
"1. **Definir**. ¿Qué tipo de modelo va ser? En nuestro caso será un Decision Tree.\n",
"1. **Ajustar** (_fit_). Obtener patrones de los datos suministrados.\n",
"1. **Predecir**.\n",
"1. **Evaluar**. Determinar cómo de buenas son las predicciones del modelo."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Veamos un ejemplo:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Importamos la librería para hacer el Decision Tree:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeRegressor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Definimos el modelo:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"casas_modelo=DecisionTreeRegressor(random_state=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(Con `random_state` lo que estamos haciendo es fijar un estado del generador de números aleatorios para que cada vez que lo ejecutemos nos salga lo mismo)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Ajustamos los datos al modelo:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,\n",
" max_leaf_nodes=None, min_impurity_decrease=0.0,\n",
" min_impurity_split=None, min_samples_leaf=1,\n",
" min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
" presort=False, random_state=1, splitter='best')"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas_modelo.fit(X,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Para probarlo, hacemos predicciones de las 5 primeras casas del training data"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"prediccioneshead=casas_modelo.predict(X.head())"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hacemos predicciones de las siguientes 5 casas: \n",
" Rooms Bathroom BuildingArea YearBuilt Landsize Lattitude Longtitude\n",
"1 2 1.0 79.0 1900.0 156.0 -37.8079 144.9934\n",
"2 3 2.0 150.0 1900.0 134.0 -37.8093 144.9944\n",
"4 4 1.0 142.0 2014.0 120.0 -37.8072 144.9941\n",
"6 3 2.0 210.0 1910.0 245.0 -37.8024 144.9993\n",
"7 2 1.0 107.0 1890.0 256.0 -37.8060 144.9954\n",
"\n",
"\n",
"Los precios predichos son: \n",
"[1035000. 1465000. 1600000. 1876000. 1636000.]\n"
]
}
],
"source": [
"print('Hacemos predicciones de las siguientes 5 casas: ')\n",
"print(X.head())\n",
"print('\\n') #Salto de línea\n",
"print('Los precios predichos son: ')\n",
"print(prediccioneshead)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Veamos si coinciden con los precios reales."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 1035000.0\n",
"2 1465000.0\n",
"4 1600000.0\n",
"6 1876000.0\n",
"7 1636000.0\n",
"Name: Price, dtype: float64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"casas.Price.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Son exactamente los mismos precios."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Validación del modelo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para evaluar la precisión del modelo, debemos comparar los valores predichos con los valores reales del target. Como no podemos irlos comparando uno a uno, necesitamos medidores (**metrics**), esto es, números que estimen la precisión del modelo."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Un ejemplo es el _error medio absoluto_ (\"mean absolute error\", **MAE**). \n",
"\n",
"El error de predicción producido en cada casa es simplemente\n",
"```\n",
"error = precio_real - precio_predicho\n",
"```\n",
"$$\n",
"E_i = y_i - \\hat{y}_i.\n",
"$$\n",
"\n",
"El MAE es el promedio de los valores absolutos de los errores de cada casa:\n",
"$$\n",
"MAE = \\frac{\\sum_{i=1}^n |E_i|}{n}.\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Otro ejemplo sería el _error cuadrático medio_ (**MSE**) y su raíz cuadrada (**RMSE**).\n",
"\n",
"$$\n",
"MSE = \\frac{\\sum_{i=1}^n E_i^2}{n},\n",
"$$\n",
"\n",
"$$\n",
"RMSE = \\sqrt{MSE}.\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nosotros lo calcularemos simplemente con una función de la librería scikit-learn."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import mean_absolute_error"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"predicciones = casas_modelo.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"434.71594577146544"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean_absolute_error(y,predicciones)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El error medio es de 434 dólares y estamos manejando precios del orden del millón. Esto indica que funciona muy bien."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sin embargo, existe un problema. Estamos evaluando la precisión del modelo en los propios training data, por tanto es lógico que el modelo funcione bien en esos mismos datos. Pero lo importante es que prediga bien los precios de casas que no están en el training data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La forma más directa de tener este problema en cuenta es seleccionando sólo una parte de los datos como training data y usar el resto para probar el modelo. Estos son los **evaluation data**. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Esto se implementa fácilmente en Python usando la función `train_test_split` de scikit-learn. Esta función nos separa los datos en training data y evaluation data de forma aleatoria. (De nuevo, fijando el `random_state` nos aseguramos de obtener lo mismo cada vez que lo ejecutemos)."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Error: 252318.87217559715\n"
]
}
],
"source": [
"# Separamos los datos en training y validation\n",
"train_X, val_X, train_y, val_y = train_test_split(X,y, random_state=1)\n",
"\n",
"# Definimos el modelo\n",
"casas_modelo = DecisionTreeRegressor(random_state=1)\n",
"\n",
"# Ajustamos los training data\n",
"casas_modelo.fit(train_X,train_y)\n",
"\n",
"# Predecimos los valores de los evaluation data\n",
"val_predicciones = casas_modelo.predict(val_X)\n",
"\n",
"# Evaluamos\n",
"print('Error: ', mean_absolute_error(val_y,val_predicciones))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"¡Corcho! Tengo un error de más de 250K dólares, que es aproximadamente un cuarto del precio medio, eso es mucho. Resulta que el modelo no era tan bueno como parecía."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Underfitting y overfitting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Meter muy pocas hojas da predicciones poco precisas, ya que no tiene en cuenta la riqueza y todas las posibles características que se pueden dar. Esto se conoce como **underfitting**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para obtener más precisión podríamos meter más divisiones en el árbol. Notemos que en general el número de hojas es $2^n$, con $n$ el número de divisiones."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sin embargo, si tenemos muchas hojas tendremos menos casas en cada hoja. Las hojas con muy pocas casas harán predicciones muy buenas para esas mismas casas pero como cada predicción se basa en muy pocas casas, las predicciones de datos nuevos serán muy malas. Esto se conoce como **overfitting**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"En general tenemos una curva:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![image](sweetspot.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"¿Cómo encontramos el **sweet spot**?\n",
"\n",
"`DecisionTreeRegressor` tiene una opción `max_leaf_nodes` que nos permite controlar la complejidad del modelo. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Podemos definir una función que nos diga el MAE que se obtiene para cada número de hojas:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):\n",
" modelo = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0) # Definir\n",
" modelo.fit(train_X,train_y) # Ajustar\n",
" preds_val = modelo.predict(val_X) # Predecir\n",
" mae = mean_absolute_error(val_y, preds_val) # Evaluar\n",
" return mae"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Usando un bucle `for` podemos comparar la precisión para distintos valores de `max_leaf_nodes`."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Max leaf nodes: 5\n",
"MAE: 324110.91454760684\n",
"\n",
"\n",
"Max leaf nodes: 50\n",
"MAE: 252108.93840069315\n",
"\n",
"\n",
"Max leaf nodes: 500\n",
"MAE: 240322.85576390434\n",
"\n",
"\n",
"Max leaf nodes: 5000\n",
"MAE: 250809.83989670756\n",
"\n",
"\n"
]
}
],
"source": [
"for max_leaf_nodes in [5, 50, 500, 5000]:\n",
" my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)\n",
" print(\"Max leaf nodes: \", max_leaf_nodes)\n",
" print(\"MAE: \", my_mae)\n",
" print(\"\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La opción óptima de las que le hemos dado es 500."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Otra forma más directa:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"candidatas_max_leaf_nodes = [5, 25, 50, 100, 250, 500]\n",
"\n",
"my_mae=[]\n",
"for i in candidatas_max_leaf_nodes:\n",
" my_mae.append(get_mae(i, train_X, val_X, train_y, val_y))\n",
" \n",
"mae_optimo = min(my_mae)\n",
"max_leaf_nodes_optimo = candidatas_max_leaf_nodes[my_mae.index(min(my_mae))]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(238337.81564194776, 250)"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mae_optimo, max_leaf_nodes_optimo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"También lo puedo ver gráficamente:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD8CAYAAACLrvgBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAHqRJREFUeJzt3X98VPWd7/HXJ5PMJDDhh8xAEVCwYiv9IdRUcdW7LraK9m713tWu9j6U7YOWe62t/bmt7t673ra7e23rrdVdtWtXr9iHW0urrWzVUtZqa38Ixt8gRaKiUqgkBDAByc/P/eN8E4YwkzkJSSbJvJ+Pxzxm5nO+Z/L9YjzvnPM954y5OyIiInFUlLoDIiIydig0REQkNoWGiIjEptAQEZHYFBoiIhKbQkNERGJTaIiISGwKDRERiU2hISIisVWWugNDLZPJ+Ny5c0vdDRGRMeXJJ59scvdssXbjLjTmzp1LfX19qbshIjKmmNmrcdrp8JSIiMSm0BARkdgUGiIiEptCQ0REYlNoiIhIbAoNERGJTaEhIiKxKTSCX/z+DW55tKHU3RARGdUUGsFjW5q45ZGXSt0NEZFRTaERZNIpWts6OdDRVequiIiMWgqNIJtOAdDY0lbinoiIjF4KjSBTmwSgqVWhISJSiEIjyIQ9jabW9hL3RERk9FJoBAdDQ3saIiKFFA0NM6s2s/Vm9qyZbTSzr4T63Wa22cw2mNkdZlYV6mZmN5lZg5k9Z2bvy/msZWa2JTyW5dRPNrPnwzo3mZmF+lFmtja0X2tmU4f+nyAyLR0OT2lOQ0SkoDh7Gm3AEnc/CVgILDWzxcDdwDuB9wA1wMdD+/OA+eGxArgVogAArgVOBU4Brs0JgVtD2571lob61cDD7j4feDi8HxapygSTqiu1pyEi0o+ioeGR1vC2Kjzc3R8MyxxYD8wObS4A7gqLHgemmNlM4Fxgrbs3u/tuYC1RAM0EJrn778Jn3QVcmPNZK8PrlTn1YZGpTWlOQ0SkH7HmNMwsYWbPADuJNvzrcpZVAZcBPwulWcDrOatvC7X+6tvy1AFmuPsOgPA8Pd6wBieTTtGoPQ0RkYJihYa7d7n7QqK9iVPM7N05i28BfuXuj4X3lu8jBlGPzcxWmFm9mdU3NjYOZNVDZNMpHZ4SEenHgM6ecvc9wKOEOQczuxbIAp/PabYNmJPzfjawvUh9dp46wBvh8BXheWeBft3m7nXuXpfNFv1e9IIy6aQmwkVE+hHn7KmsmU0Jr2uADwC/N7OPE81TXOru3TmrrAYuD2dRLQb2hkNLa4BzzGxqmAA/B1gTlrWY2eJw1tTlwP05n9VzltWynPqwyKRTvHmgk7ZO3UpERCSfyhhtZgIrzSxBFDKr3P2nZtYJvAr8Lpwhe5+7fxV4EDgfaAD2Ax8DcPdmM/sa8ET43K+6e3N4fQVwJ9FZWA+FB8B1wCozWw68Blx8BGMtKlMbXauxq7Wdo6fUDOePEhEZk4qGhrs/ByzKU8+7bjgD6soCy+4A7shTrwfenae+Czi7WB+HSu4FfgoNEZHD6YrwHJm07j8lItIfhUaOjO50KyLSL4VGjmytblooItIfhUaO6qoEtalK7WmIiBSg0OgjupWIQkNEJB+FRh+ZdFKhISJSgEKjj0xaNy0UESlEodFHRvefEhEpSKHRRyadYs/+Djq6uos3FhEpMwqNPjK10QV+u3SISkTkMAqNPvRd4SIihSk0+ui9KlyhISJyGIVGH9mePQ1d4CcichiFRh89cxo67VZE5HAKjT4mJCuZkExoTkNEJA+FRh66VkNEJD+FRh66lYiISH4KjTwy6RRNLZrTEBHpS6GRh+50KyKSn0Ijj0w6RfP+djp1KxERkUMoNPLIppO4Q/M+HaISEcml0MhDV4WLiOSn0Mgjo+8KFxHJS6GRh24lIiKSn0Ijj4N7GgoNEZFcCo08JiYTVFdVKDRERPpQaORhZvqucBGRPBQaBej+UyIih1NoFJBJp2jURLiIyCEUGgVka5M6PCUi0odCo4BMOkXzvja6ur3UXRERGTUUGgVk0im6HXbv196GiEgPhUYBPbcS0WS4iMhBCo0CMunwXeH6Xg0RkV4KjQJ0VbiIyOEUGgXo8JSIyOEUGgVMqq4kmajQ7dFFRHIUDQ0zqzaz9Wb2rJltNLOvhPo8M1tnZlvM7Admlgz1VHjfEJbPzfmsa0J9s5mdm1NfGmoNZnZ1Tj3vzxgJ0a1EkprTEBHJEWdPow1Y4u4nAQuBpWa2GPg6cIO7zwd2A8tD++XAbnc/HrghtMPMFgCXAO8ClgK3mFnCzBLAzcB5wALg0tCWfn7GiNB3hYuIHKpoaHikNbytCg8HlgA/CvWVwIXh9QXhPWH52WZmoX6Pu7e5+ytAA3BKeDS4+8vu3g7cA1wQ1in0M0aEbiUiInKoWHMaYY/gGWAnsBZ4Cdjj7p2hyTZgVng9C3gdICzfC0zLrfdZp1B9Wj8/Y0Rk0kntaYiI5IgVGu7e5e4LgdlEewYn5msWnq3AsqGqH8bMVphZvZnVNzY25msyKJl0il372unWrURERIABnj3l7nuAR4HFwBQzqwyLZgPbw+ttwByAsHwy0Jxb77NOoXpTPz+jb79uc/c6d6/LZrMDGVK/MukUXd3Onrc6huwzRUTGsjhnT2XNbEp4XQN8ANgEPAJcFJotA+4Pr1eH94Tlv3B3D/VLwtlV84D5wHrgCWB+OFMqSTRZvjqsU+hnjIisLvATETlEZfEmzARWhrOcKoBV7v5TM3sBuMfM/h54Grg9tL8d+J6ZNRDtYVwC4O4bzWwV8ALQCVzp7l0AZvYpYA2QAO5w943hs75c4GeMiN4L/FraOGFG7Uj+aBGRUaloaLj7c8CiPPWXieY3+tYPABcX+Kx/AP4hT/1B4MG4P2OkZGujy0J0gZ+ISERXhPfj4K1EdIGfiAgoNPo1uaaKqoRpTkNEJFBo9MPMmDYxRZMu8BMRARQaRWVqdYGfiEgPhUYRmXRKcxoiIoFCo4goNLSnISICCo2iMukUu1rbia41FBEpbwqNIjLpJO1d3bz5VmfxxiIi45xCo4ieW4noAj8REYVGUfqucBGRgxQaRSg0REQOUmgUkUlH95/SBX4iIgqNoqZOSJKoMM1piIig0CiqosI4amKSphZd4CciotCIQRf4iYhEFBoxZNK6/5SICCg0Ysnq/lMiIoBCI5ZsbYrG1jbdSkREyp5CI4ZMOkV7ZzctbbqViIiUN4VGDJlaXashIgIKjVj0XeEiIhGFRgy6lYiISEShEYNCQ0QkotCI4aiJSSpMcxoiIgqNGBLhViKNmtMQkTKn0IhJtxIREVFoxKbQEBFRaMSm+0+JiCg0YsukU7o9uoiUPYVGTJnaFG91dLFPtxIRkTKm0Iip51qNRp12KyJlTKERU+93hWteQ0TKmEIjJl0VLiKi0IgtWxsOT+kCPxEpYwqNmI6aqNuji4goNGKqSlQwdUKVDk+JSFlTaAyArgoXkXJXNDTMbI6ZPWJmm8xso5l9JtQXmtnjZvaMmdWb2SmhbmZ2k5k1mNlzZva+nM9aZmZbwmNZTv1kM3s+rHOTmVmoH2Vma0P7tWY2dej/CeKLQkNzGiJSvuLsaXQCX3D3E4HFwJVmtgD4BvAVd18I/F14D3AeMD88VgC3QhQAwLXAqcApwLU5IXBraNuz3tJQvxp42N3nAw+H9yWTrdWehoiUt6Kh4e473P2p8LoF2ATMAhyYFJpNBraH1xcAd3nkcWCKmc0EzgXWunuzu+8G1gJLw7JJ7v47d3fgLuDCnM9aGV6vzKmXRHQrEYWGiJSvyoE0NrO5wCJgHfBZYI2ZXU8UPn8Sms0CXs9ZbVuo9VfflqcOMMPdd0AUXmY2fSD9HWqZ2iT72rt4q72LmmSilF0RESmJ2BPhZpYG7gU+6+5vAlcAn3P3OcDngNt7muZZ3QdRj83MVoR5lfrGxsaBrDogusBPRMpdrNAwsyqiwLjb3e8L5WVAz+sfEs1TQLSnMCdn9dlEh676q8/OUwd4Ixy+IjzvzNc/d7/N3evcvS6bzcYZ0qBke+4/pdAQkTIV5+wpI9qL2OTu38pZtB340/B6CbAlvF4NXB7OoloM7A2HmNYA55jZ1DABfg6wJixrMbPF4WddDtyf81k9Z1kty6mXRO+ehuY1RKRMxZnTOB24DHjezJ4Jtb8BPgHcaGaVwAGis58AHgTOBxqA/cDHANy92cy+BjwR2n3V3ZvD6yuAO4Ea4KHwALgOWGVmy4HXgIsHMcYhk6ntuWmhTrsVkfJUNDTc/dfkn3cAODlPeweuLPBZdwB35KnXA+/OU98FnF2sjyNl2kTNaYhIedMV4QOQrKxgco1uJSIi5UuhMUD6rnARKWcKjQHSd4WLSDlTaAxQpjalU25FpGwpNAYoq1uJiEgZU2gMUCadpKWtkwMdXaXuiojIiFNoDJBuJSIi5UyhMUAHQ0OT4SJSfhQaA5Sp1a1ERKR8KTQGKJPuuZWIQkNEyo9CY4A0pyEi5UyhMUDVVQlqU5Wa0xCRsqTQGISsLvATkTKl0BgEfVe4iJQrhcYgZGp100IRKU8KjUHIpFOa0xCRsqTQGIRMOsXetzpo7+wudVdEREaUQmMQek673bVPh6hEpLwoNAah9wI/fa+GiJQZhcYg9N5KRJPhIlJmFBqDkA2Hp3SthoiUG4XGIOhWIiJSrhQag1CTTDAxmaBRF/iJSJlRaAxSplbXaohI+VFoDJJuJSIi5UihMUiZtG4lIiLlR6ExSNGtRBQaIlJeFBqDlEmn2L2/g44u3UpERMqHQmOQei7wa96nyXARKR8KjUHqucBv2+79Je6JiMjIUWgM0qnzjiKdquRfH3ul1F0RERkxCo1BmjoxycfPnMdDG/7Ic9v2lLo7IiIjQqFxBJafMY+pE6q4/ucvlrorIiIjQqFxBGqrq/jkWcfzqxcbefzlXaXujojIsFNoHKHLTjuWGZNSXL9mM+5e6u6IiAwrhcYRqq5KcNXZ86l/dTePbm4sdXdERIaVQmMIfKRuDsccNYFvrtlMd7f2NkRk/CoaGmY2x8weMbNNZrbRzD6Ts+zTZrY51L+RU7/GzBrCsnNz6ktDrcHMrs6pzzOzdWa2xcx+YGbJUE+F9w1h+dyhGvhQqkpU8PkPnsALO97kwQ07St0dEZFhE2dPoxP4grufCCwGrjSzBWb2Z8AFwHvd/V3A9QBmtgC4BHgXsBS4xcwSZpYAbgbOAxYAl4a2AF8HbnD3+cBuYHmoLwd2u/vxwA2h3aj05ycdzTtm1PKtn79Ip24tIiLjVNHQcPcd7v5UeN0CbAJmAVcA17l7W1i2M6xyAXCPu7e5+ytAA3BKeDS4+8vu3g7cA1xgZgYsAX4U1l8JXJjzWSvD6x8BZ4f2o06iwvjCOSfwctM+7nvqD6XujojIsBjQnEY4PLQIWAecAJwZDhv90szeH5rNAl7PWW1bqBWqTwP2uHtnn/ohnxWW7w3tR6UPLpjBSXOm8O3/eJG2zq5Sd0dEZMjFDg0zSwP3Ap919zeBSmAq0SGrvwZWhb2AfHsCPog6RZbl9m2FmdWbWX1jY+nOYDIzvnTuO9i+9wD/tu61kvVDRGS4xAoNM6siCoy73f2+UN4G3OeR9UA3kAn1OTmrzwa291NvAqaYWWWfOrnrhOWTgea+/XP329y9zt3rstlsnCENm9OPz/Anb5/GzY80sK+ts/gKIiJjSJyzpwy4Hdjk7t/KWfQTorkIzOwEIEkUAKuBS8KZT/OA+cB64AlgfjhTKkk0Wb7aoyviHgEuCp+7DLg/vF4d3hOW/8LHwBV0Xzz3HTS1tnPnb7eWuisiIkMqzp7G6cBlwBIzeyY8zgfuAI4zsw1Ek9rLwl7HRmAV8ALwM+BKd+8KcxKfAtYQTaavCm0Bvgx83swaiOYsbg/124Fpof55oPc03dHsfcdM5QMnTuc7v3yJvfs7St0dEZEhY2PgD/cBqaur8/r6+lJ3g0073uS8Gx/jk2e9nS8tfWepuyMi0i8ze9Ld64q10xXhw+TEmZP48ElH8/9+s5WdLQdK3R0RkSGh0BhGn/vgCbR3dXPLIy+VuisiIkNCoTGM5mUm8pG62dy97lV9LayIjAsKjWH26SXzMTNu/I8tpe6KiMgRU2gMs6On1HDZ4mO596ltNOxsLXV3RESOiEJjBHzyrLdTU5XghrX6WlgRGdsUGiNgWjrF8jPm8cDzO9jwh72l7o6IyKApNEbIx//TcUyuqeL6n28udVdERAZNoTFCJlVXccVZb+fRzY2sf+Ww22eJiIwJCo0RtOy0uWRrU3xzze8Zb1fii0h5UGiMoJpkgquWHM8TW3fzyxdLdwt3EZHBUmiMsL98/zHMnlrDN9dsprtbexsiMrYoNEZYsrKCz33gBDZuf5P7ntbXworI2KLQKIELF83ipDlT+OsfPcsNa1+kS3scIjJGKDRKIFFhfP8Tp/JfFs3ixoe3sOyO9TS1tpW6WyIiRSk0SmRCspL/e/FJfP0v3sMTW5s5/8bHdCquiIx6Co0SMjP+8v3H8ONPns6EZIJLv/s43/nlS5ogF5FRS6ExCiw4ehL//ukzOPddM7juod+z4nv17NnfXupuiYgcRqExStRWV3HzR9/H//7zBfzyxUY+dNOvefb1PaXulojIIRQao4iZ8Venz2PVfz8NgIu+81tW/narrh4XkVFDoTEKLTpmKg9cdQZnzs9y7eqNfOr7T9NyoKPU3RIRUWiMVlMmJPnXy+v48tJ38rMNf+TD//wbNu14s9TdEpEyp9AYxSoqjCvOejv/9vFT2dfWyYU3/4ZVT7xe6m6JSBlTaIwBpx43jQeuOpOTj53Kl+59ji/+8Fneau8qdbdEpAwpNMaIbG2K7y0/lauWHM+9T23jwpt/w0uN+s5xERlZCo0xJFFhfP6cd3Dnx06hsbWND//Tr/n3Z7eXulsiUkYUGmPQn56Q5YGrzuCdMyfx6e8/zf/6yQbaOnW4SkSGn0JjjJo5uYZ7VizmE2fO43uPv8pFt/6O15v3l7pbIjLOKTTGsKpEBX/7oQXcdtnJbN21jw/d9BhrX3ij1N0SkXFMoTEOnPOut/HAp8/k2GkT+cRd9fyfBzfR0dVd6m6JyDik0Bgnjpk2gR/+j9O4bPGx/MuvXuaj332cP+49UOpuicg4o9AYR6qrEnztwndz4yUL2bj9Tc6/6TEe29JY6m6JyDii0BiHLlg4i9WfOoNsOsXld6zXV8qKyJBRaIxTx09P85MrT+e/Lpqtr5QVkSGj0BjHapIJrr/4vXzjL96rr5QVkSGh0BjnzIyPvH8OP/7k6UxMVXLpdx/nlkcbeL15v+5fJSIDZuPtC37q6uq8vr6+1N0YlVoOdHD1vc/zwPM7emsTkwmytSky6fCoTZJNV5OpTfbWpoflNclECXsvIsPJzJ5097pi7SpjfNAc4C7gbUA3cJu735iz/IvAN4GsuzeZmQE3AucD+4G/cvenQttlwP8Mq/69u68M9ZOBO4Ea4EHgM+7uZnYU8ANgLrAV+Ii77y46esmrtrqKf/7oIi575Vhea95PU2sbTS3tNLa20dTSxkuNrax7pY3d+/N/4dPEZIJMCJBsCJieYMn2qU9IFv3VEpExKM7/2Z3AF9z9KTOrBZ40s7Xu/kIIlA8Cr+W0Pw+YHx6nArcCp4YAuBaoAzx8zuoQArcCK4DHiUJjKfAQcDXwsLtfZ2ZXh/dfPuJRlzEzY/Fx01h83LSCbTq6utnV2k5TaxuNrW00trT1BkxTeD+QgMmkk4fuzaRTZHP2aBQwImNH0f9b3X0HsCO8bjGzTcAs4AXgBuBLwP05q1wA3OXRca/HzWyKmc0EzgLWunszgJmtBZaa2aPAJHf/XajfBVxIFBoXhPUAVgKPotAYdlWJCt42uZq3Ta4u2rZvwDS19DxHtabWNl5p2sf6V5oLBsyEZCJnbyXP3kvOHs3ElAJGxjd3p6PLaevsor2zm7bO7t7nw2tdoR49zn7ndI6eUjOs/RvQ/4FmNhdYBKwzsw8Df3D3Z6MjUr1mAblfL7ct1Pqrb8tTB5gRQgt332Fm0wfSXxl+Aw2Y5n3tNLYcDJim1oN7Lz0B88TW3TTva8/7GT0B0xMuvXswtSmyffZoFDAyUF3d+TbWXRzo6Ka9q5u2jnwb7sIb83zLex8dXb2fGT2Hdbq6GexU8+yPvX/0hIaZpYF7gc8SHbL6W+CcfE3z1HwQ9djMbAXR4S2OOeaYgawqI6gqUcGMSdXMmDSwgDkYKgf3Xhpb2ti6ax/1rxYOmJqqxKF7L7U9cy5RwOTu0ShgSsvdezeYg94wd3TRdshGeGAb8/bObjqH4CLYCoNUZYJUVQXJRAWpqgpSlYmc1xVMrqkiVZsiWRm9T1UmwnP0SIZa7/KqCpKJRM6yClJVh35msrKCKTXJIfiv0b9Y/6eYWRVRYNzt7veZ2XuAeUDPXsZs4CkzO4VoT2FOzuqzge2hflaf+qOhPjtPe4A3zGxm2MuYCezM1z93vw24DaKzp+KMSUa3IwmYptbc19GjJ2B272/P+1dcTVWid2K/J1ii133mY2pTTEwm6LN3PWa5O53d3v+GuaO7d2N8pH9l527gezfsXVF9KCRzNrw9G+LcDfOEZCVTJxxaS/bZUB+2sT9ked8N/OHLKxPj+0qGOGdPGXA7sMndvwXg7s8D03PabAXqwtlTq4FPmdk9RBPhe8NGfw3wj2Y2Nax2DnCNuzebWYuZLQbWAZcD/xTarAaWAdeF59y5ExFgYAHTGQJmZ07A5B4ea2pt49Vd+2MHTO7eSjbPHk1/AdPd7Yf8RdzW30a2wMa8vSv/Rnggf2UPxR1mKius4IY3mYhqEydWhtqhG/PkYRviQ2sFP7Pv5yQqxk2Yj2Zx9jROBy4DnjezZ0Ltb9z9wQLtHyQ63baB6JTbjwGEcPga8ERo99WeSXHgCg6ecvtQeEAUFqvMbDnRGVoXxxyXSF6ViQqmT6pm+gACprHv4bGeOZnWNl7btZ+nXt1Nc4GAqa6qIJNOkaiww/7K7ug68q21Gb0bzEMPVxz8C7i2uvKwv4rzb5grSPZsvKvyfWa+v+Cjz0xUaGNdLnRxn8gQyA2Yptb2nLPIonDpdg49Nl3gEEi+wx6HH9s+uDGvSpj+upYhMWQX94lIcQPZgxEZy8b3jI2IiAwphYaIiMSm0BARkdgUGiIiEptCQ0REYlNoiIhIbAoNERGJTaEhIiKxjbsrws2sEXh1AKtkgKZh6s5oVo7jLscxQ3mOuxzHDEc27mPdPVus0bgLjYEys/o4l86PN+U47nIcM5TnuMtxzDAy49bhKRERiU2hISIisSk0wpc3laFyHHc5jhnKc9zlOGYYgXGX/ZyGiIjEpz0NERGJraxDw8yWmtlmM2sws6tL3Z+hYmZ3mNlOM9uQUzvKzNaa2ZbwPDXUzcxuCv8Gz5nZ+0rX88Ezszlm9oiZbTKzjWb2mVAf7+OuNrP1ZvZsGPdXQn2ema0L4/6BmSVDPRXeN4Tlc0vZ/yNhZgkze9rMfhrel8OYt5rZ82b2jJnVh9qI/o6XbWiYWQK4GTgPWABcamYLSturIXMnsLRP7WrgYXefDzwc3kM0/vnhsQK4dYT6ONQ6gS+4+4nAYuDK8N9zvI+7DVji7icBC4GlZrYY+DpwQxj3bmB5aL8c2O3uxwM3hHZj1WeATTnvy2HMAH/m7gtzTq0d2d9xdy/LB3AasCbn/TXANaXu1xCOby6wIef9ZmBmeD0T2Bxe/wtwab52Y/kB3A98sJzGDUwAngJOJbrAqzLUe3/XgTXAaeF1ZWhnpe77IMY6m2gDuQT4KWDjfcyh/1uBTJ/aiP6Ol+2eBjALeD3n/bZQG69muPsOgPA8PdTH3b9DOPywCFhHGYw7HKZ5BtgJrAVeAva4e2dokju23nGH5XuBaSPb4yHxbeBLQHd4P43xP2YAB35uZk+a2YpQG9Hf8XL+jnDLUyvHU8nG1b+DmaWBe4HPuvubZvmGFzXNUxuT43b3LmChmU0BfgycmK9ZeB7z4zaz/wzsdPcnzeysnnKepuNmzDlOd/ftZjYdWGtmv++n7bCMu5z3NLYBc3Lezwa2l6gvI+ENM5sJEJ53hvq4+XcwsyqiwLjb3e8L5XE/7h7uvgd4lGhOZ4qZ9fxRmDu23nGH5ZOB5pHt6RE7HfiwmW0F7iE6RPVtxveYAXD37eF5J9EfCKcwwr/j5RwaTwDzwxkXSeASYHWJ+zScVgPLwutlRMf8e+qXhzMtFgN7e3Z1xxKLdiluBza5+7dyFo33cWfDHgZmVgN8gGhy+BHgotCs77h7/j0uAn7h4YD3WOHu17j7bHefS/T/7S/c/b8xjscMYGYTzay25zVwDrCBkf4dL/XEToknlc4HXiQ6Bvy3pe7PEI7r+8AOoIPor43lRMdwHwa2hOejQlsjOovsJeB5oK7U/R/kmM8g2vV+DngmPM4vg3G/F3g6jHsD8HehfhywHmgAfgikQr06vG8Iy48r9RiOcPxnAT8thzGH8T0bHht7tlkj/TuuK8JFRCS2cj48JSIiA6TQEBGR2BQaIiISm0JDRERiU2iIiEhsCg0REYlNoSEiIrEpNEREJLb/DxyPGzb8xi2YAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"plt.plot(candidatas_max_leaf_nodes,my_mae)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ahora que ya hemos encontrado el número óptimo de hojas de nuestro modelo, ya no necesitamos separar entre training y evaluation data y podemos crear un modelo final con todos los datos de los que disponemos."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,\n",
" max_leaf_nodes=250, min_impurity_decrease=0.0,\n",
" min_impurity_split=None, min_samples_leaf=1,\n",
" min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
" presort=False, random_state=1, splitter='best')"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Definir\n",
"casas_modelo_final = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes_optimo,random_state=1)\n",
"\n",
"# Ajustar\n",
"casas_modelo_final.fit(X,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Random forest"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Una forma de evitarse el tener que buscar el número de hojas óptimo es utilizar un algoritmo distinto, el **random forest**. Un random forest genera muchos árboles y hace una predicción ponderando las predicciones de cada árbol. En general es más preciso que un solo árbol de decisión y funciona bien con los parámetros por defecto."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para implementarlo en Python usamos la clase `RandomForestRegressor` de scikit-learn."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.ensemble import RandomForestRegressor"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Error: 185910.98665805894\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/guillermo/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:246: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.\n",
" \"10 in version 0.20 to 100 in 0.22.\", FutureWarning)\n"
]
}
],
"source": [
"# Definir\n",
"casas_modelo_forest = RandomForestRegressor(random_state=1)\n",
"\n",
"# Ajustar\n",
"casas_modelo_forest.fit(train_X, train_y)\n",
"\n",
"# Predecir\n",
"predicciones = casas_modelo_forest.predict(val_X)\n",
"\n",
"# Evaluar\n",
"print('Error: ', mean_absolute_error(val_y,predicciones))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sigue siendo bastante, pero es una buena mejoría respecto al Decision Tree."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tarea final"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Probar suerte en una competición de Kaggle."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}