This dataset has 516 observations and 21 variables. It contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive http://lib.stat.cmu.edu/datasets/boston_corrected.txt, and has been used extensively throughout the literature to benchmark algorithms. Note that the details of this dataset were sourced from https://geodacenter.github.io/data-and-lab/boston-housing/. The data was originally published in see Harrison Jr and Rubinfeld (1978) and subsequently corrected in see Gilley et al. (1996)

boston_housing

Format

A tibble with 516 rows and 21 variables:

obs

Sequential ID

town

A factor with levels given by town names

town_no

A numeric vector corresponding to TOWN

tract

A numeric vector of tract ID numbers

lon

A numeric vector of tract point longitudes in decimal degrees

lat

A numeric vector of tract point latitudes in decimal degrees

medv

A numeric vector of median values of owner-occupied housing in USD 1000

cmedv

A numeric vector of corrected median values of owner-occupied housing in USD 1000

crim

A numeric vector of per capita crime

zn

A numeric vector of proportions of residential land zoned for lots over 25000 sq. ft per town (constant for all Boston tracts)

indus

A numeric vector of proportions of non-retail business acres per town (constant for all Boston tracts)

chas

A factor with levels 1 if tract borders Charles River; 0 otherwise

nox

A numeric vector of nitric oxides concentration (parts per 10 million) per town

rm

A numeric vector of average numbers of rooms per dwelling

age

A numeric vector of proportions of owner-occupied units built prior to 1940

dis

A numeric vector of weighted distances to five Boston employment centers

rad

A numeric vector of an index of accessibility to radial highways per town (constant for all Boston tracts)

tax

A numeric vector full-value property-tax rate per USD 10,000 per town (constant for all Boston tracts)

ptratio

A numeric vector of pupil-teacher ratios per town (constant for all Boston tracts)

b

A numeric vector of 1000*(Bk - 0.63)^2 where Bk is the proportion of the black population

lstat

A numeric vector of percentage values of lower status population

Source

http://lib.stat.cmu.edu/datasets/boston_corrected.txt

References

Gilley OW, Pace RK, others (1996). “On the Harrison and Rubinfeld data.” Journal of Environmental Economics and Management, 31(3), 403--405.

Harrison Jr D, Rubinfeld DL (1978). “Hedonic housing prices and the demand for clean air.” Journal of environmental economics and management, 5(1), 81--102.