There are many packages in R for text processing.
It is thus possible to analyze a text and extract the most common words and visualize this set of words as a cloud .. from this derives the term for the type of visualization itself word cloud of words...
The R code for the word cloud visualization has to import many libraries. So install the following packages
library(tm)
libray(wordcloud)
What about the text we are going to visualize? It is a wikipedia page of a country. We will find out of what country it is by the wordcloud itself. I copied that page in a .txt file. So here is the code. It is well commented in order to explain what does each line.
page = readLines("italy.txt") corpus = Corpus(VectorSource(page)) corpus = tm_map(corpus, tolower) corpus = tm_map(corpus, removePunctuation) corpus = tm_map(corpus, removeNumbers) corpus = tm_map(corpus, removeWords, stopwords("english")) dtm = TermDocumentMatrix(corpus) //Error: inherits(doc, "TextDocument") is not TRUE corpus = tm_map(corpus, PlainTextDocument) m = as.matrix(dtm) v = sort(rowSums(m), decreasing = TRUE) wordcloud(names(v), v, min.freq = 10) Go ahead and try the example. The code will complain at some point about some small error but at the end, it will run ... so no problem. Here is the source of this article: http://www.datatreemap.com/vis4r/wordcloud_in_R.php For more examples of data collecting analysis and visualizing http://www.datatreemap.com P.S. Did you find out what country is the wikipedia page about? Italy of course ;-)
read the text file, line by line
produce a corpus of the text
convert all of the text to lower case (standard practice for text processing)
remove any kind of punctuation
remove all the numbers
remove English stop words
create a document term matrix
there will be a kind of warning but I'm not sure about this warning
it will then reconfigure the corpus as a text document
dtm = TermDocumentMatrix(corpus)convert the document matrix to a standard matrix for use in the
sort the data so we end up with the highest as biggest
finally produce the word cloud