Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Florida Tech Evans Library Logo

Introduction to Text Mining

An overview of text mining tools and techniques.

str_length(string)

str_length(string) will return the width of a string. Note that str_length() will also count whitespaces contained within a string. 

# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear", "Richard III", "Julius Ceasar", "Othello", "Twelfth Night", "Hamlet", "Timon of Athens", "King Lear", "As You Like It", "Measure for Measure", "Twelfth Night") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world, the paragon of animals", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing", "Now is the winter of our discontent", "Cowards die many times before their deaths; the valiant never taste of death but once", "I am one who loved not wisely but too well", "If music be the food of love play on", "We know what we are, but know not what we may be", "We have seen better days", "I am a man more sinned against than sinning", "All the world's a stage, And all the men and women merely players: They have their exits and their entrances; And one man in his time plays many parts", "Some rise by sin, and some by virtue fall", "Some are born great, some achieve greatness, and some have greatness thrust upon them" ) shakespeare = data.frame(play, quote) # Use str_length(string) to find the number of characters in a string (including whitespace). # Use the 'quote' vector in the shakespeare dataframe with str_lenght(). # Use str_length(string) to find the number of characters in a string (including whitespace). # Use the 'play' vector in the shakespeare dataframe with str_lenght(). str_length(shakespeare$play) test_function("str_length") success_msg("Success! Remember, str_length will also count the number of whitespaces.")
Use(str_length(shakespeare$quote))

str_pad(string, width, side)

str_pad() will append whitespace on the end of a string. 

Arguments: 

string = the variable or column containing the string you wish to pad. 

width =  a digit for the total length of the string once it has been padded with whitespaces. Note, specifying width = 30 on a string with 10 characters will result in a string of 30 characters. Specifying width = 30 on a string containing 20 characters will also return a padded string with 30 characters. The difference will be in the amount of whitespace added. 

side: the side you want to pad the string on. You can specify, "right", "left", or "both".

# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear", "Richard III", "Julius Ceasar", "Othello", "Twelfth Night", "Hamlet", "Timon of Athens", "King Lear", "As You Like It", "Measure for Measure", "Twelfth Night") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world, the paragon of animals", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing", "Now is the winter of our discontent", "Cowards die many times before their deaths; the valiant never taste of death but once", "I am one who loved not wisely but too well", "If music be the food of love play on", "We know what we are, but know not what we may be", "We have seen better days", "I am a man more sinned against than sinning", "All the world's a stage, And all the men and women merely players: They have their exits and their entrances; And one man in his time plays many parts", "Some rise by sin, and some by virtue fall", "Some are born great, some achieve greatness, and some have greatness thrust upon them" ) shakespeare = data.frame(play, quote) # Use str_pad() on the 'play' vector in the shakespeare dataframe. # Pad the strings to length 30 on both sides. Assign this to 'pad' pad = # The code below will show the new strings and their lengths. You don't need to alter this code. pad str_length(pad) # Use str_pad() on the 'play' vector in the shakespeare dataframe. # Pad the strings to length 30 on both sides. Assign this to 'pad' pad = str_pad(shakespeare$play, 30, "both") # The code below will show the new strings and their lengths. You don't need to alter this code. pad str_length(pad) test_function("str_pad") success_msg("Great Job!. Notice how each string has a total length of 30 characters, regardless of the length of the initial string.")
Use(pad = str_pad(shakespeare$play, 30, "both"))

str_trim(string, side)

str_trim() will delete whitespace surrounding a string. This is useful for eliminating inconsistencies introduced by human error.

Arguments:

string = the variable or column containing the string you wish to pad.

side: the side you want to pad the string on. You can specify, "right", "left", or "both".

# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear", "Richard III", "Julius Ceasar", "Othello", "Twelfth Night", "Hamlet", "Timon of Athens", "King Lear", "As You Like It", "Measure for Measure", "Twelfth Night") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world, the paragon of animals", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing", "Now is the winter of our discontent", "Cowards die many times before their deaths; the valiant never taste of death but once", "I am one who loved not wisely but too well", "If music be the food of love play on", "We know what we are, but know not what we may be", "We have seen better days", "I am a man more sinned against than sinning", "All the world's a stage, And all the men and women merely players: They have their exits and their entrances; And one man in his time plays many parts", "Some rise by sin, and some by virtue fall", "Some are born great, some achieve greatness, and some have greatness thrust upon them" ) shakespeare = data.frame(play, quote) pad = str_pad(shakespeare$play, 30, "both") # str_trim(string, position) will delete whitespace from the left, right, or both sides of a string. # The 'pad' variable has already been loaded. Use str_trim() to trim both sides of 'pad'. # str_trim(string, position) will delete whitespace from the left, right, or both sides of a string. # The 'pad' variable has already been loaded. Use str_trim() to trim both sides of 'pad'. str_trim(pad, "both") test_function("str_trim") success_msg("Awesome! Just as with str_pad, you can specify the left, right, or both sides of a string to trim the whitespace from.")
Use(str_trim(pad, "both"))