1 ! Copyright (c) 2012 Anonymous
2 ! See http://factorcode.org/license.txt for BSD license.
3 USING: assocs fry io.encodings.utf8 io.files kernel sequences
4 sets splitting vectors ;
5 IN: rosetta-code.inverted-index
7 ! http://rosettacode.org/wiki/Inverted_index
9 ! An Inverted Index is a data structure used to create full text
12 ! Given a set of text files, implement a program to create an
13 ! inverted index. Also create a user interface to do a search
14 ! using that inverted index which returns a list of files that
15 ! contain the query term / terms. The search index can be in
18 : file-words ( file -- assoc )
19 utf8 file-contents " ,;:!?.()[]{}\n\r" split harvest ;
21 : add-to-file-list ( files file -- files )
22 over [ swap [ adjoin ] keep ] [ nip 1vector ] if ;
24 : add-to-index ( words index file -- )
25 '[ _ [ _ add-to-file-list ] change-at ] each ;
27 : (index-files) ( files index -- )
28 [ [ [ file-words ] keep ] dip swap add-to-index ] curry each ;
30 : index-files ( files -- index )
31 H{ } clone [ (index-files) ] keep ;
33 : query ( terms index -- files )
34 [ at ] curry map [ ] [ intersect ] map-reduce ;