! Copyright (C) 2012 John Benediktsson
! See http://factorcode.org/license.txt for BSD license
-USING: assocs csv io.encodings.utf8 io.files kernel math.parser
-sequences ;
+USING: accessors ascii assocs csv io.encodings.utf8 io.files
+kernel math.parser sequences splitting ;
IN: machine-learning.data-sets
"resource:extra/machine-learning/data-sets/" prepend
utf8 file-contents ;
+: load-table ( name -- data names )
+ load-file [ blank? ] trim string-lines
+ [ [ blank? ] split-when ] map unclip
+ [ [ [ string>number ] map ] map ] dip ;
+
PRIVATE>
: load-iris ( -- data-set )
"sepal length (cm)" "sepal width (cm)"
"petal length (cm)" "petal width (cm)"
} <data-set> ;
+
+: load-linnerud ( -- data-set )
+ data-set new
+ "linnerud_exercise.csv" load-table
+ [ >>data ] [ >>feature-names ] bi*
+ "linnerud_physiological.csv" load-table
+ [ >>target ] [ >>target-names ] bi*
+ "linnerud.rst" load-file >>description ;
+
--- /dev/null
+Linnerrud dataset
+
+Notes
+-----
+Data Set Characteristics:
+ :Number of Instances: 20
+ :Number of Attributes: 3
+ :Missing Attribute Values: None
+
+The Linnerud dataset constains two small dataset:
+
+- *exercise*: A list containing the following components: exercise data with
+ 20 observations on 3 exercise variables: Weight, Waist and Pulse.
+
+- *physiological*: Data frame with 20 observations on 3 physiological variables:
+ Chins, Situps and Jumps.
+
+References
+----------
+ * http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=mixOmics:linnerud
+ * Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.