Skip to Main Content
Florida Tech Evans Library Logo

Introduction to Text Mining

An overview of text mining tools and techniques.

str_detect(string, pattern)

str_detect(string, pattern) will detect a matched pattern specified by the user. The output will be a list of logical values with TRUE indicating the presence of a match and FALSE indicating no match within the string. 
# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world, the paragon of animals", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing") shakespeare = data.frame(play, quote) # This exercise uses the shakespeare dataframe # Use str_detect() to detect which objects in shakespeare$quote contain the word "man" # This exercise uses the shakespeare dataframe # Use str_detect() to detect which items in shakespeare$quote contain the word "man" str_detect(shakespeare, "man") test_function("str_detect") success_msg("As you can see, the function str_detect() returns a list of logicals, with True indicating the presence of the pattern and False indicating the absence.")
Use() to return a list of logical operators denoting the presence of the pattern.

str_which(string, pattern)

str_which(string, pattern) will return a list of indexes with matched patterns specified by the user. 
# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty!", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing") shakespeare = data.frame(play, quote) # Use str_which() to view the indexes that match the pattern "man" in the shakespeare dataframe. # Use str_which() to view the indexes that match the pattern "man" in the shakespeare dataframe. str_which(shakespeare, "man") test_function("str_which") success_msg("Success! ")
Use(str_which(shakespeare, "man"))to return the correct result

str_count(string, pattern)

str_count(string, pattern) will return the number of matches within a string based on a pattern specified by the user. 
# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world, the paragon of animals", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing") shakespeare = data.frame(play, quote) # This exercise uses the shakespeare dataframe # Use str_count() to count the number of pattern matches for "man" # This exercise uses the shakespeare dataframe # Use str_count() to count the number of pattern matches for "man" str_count(shakespeare, "man") test_function("str_count") success_msg("Great! As you can see, str_count counts the number of pattern matches for each individual index rather than the total in the quote vector.")
Use() to count the number of pattern matches at each index.

str_locate(string, pattern)

str_locate(string, pattern) will return the start and end positions of a matched pattern within a string. 
# This will get executed each time the exercise gets initialized library(stringr) play = c("Hamlet", "Romeo & Juliette", "Romeo & Julliette", "The Merchant of Venice", "King Henry IV", "Julius Ceasar", "MacBeth", "King Lear") quote = c("What a piece of work is man! how noble in reason! how infinite in faculty! in form and moving how express and admirable! in action how like an angel! in apprehension how like a god! the beauty of the world, the paragon of animals", "What's in a name? That which we call a rose by any other name would smell as sweet", "Tempt not a desperate man", "The devil can cite Scripture for his purpose", "A man can die but once", "But, for my own part, it was Greek to me", "Double, double toil and trouble; Fire burn, and cauldron bubble", "Nothing will come of nothing") shakespeare = data.frame(play, quote) # This exercise uses the shakespeare dataframe # Use str_locate() to show the position of matches for the pattern "man" within a string. # This exercise uses the shakespeare dataframe # Use str_locate() to show the position of matches for the pattern "man" within a string. str_locate(shakespeare, "man") test_function("str_locate") success_msg("Great! As you can see, str_locate will show you the start and end indexes for a matched pattern within a string. In the first index, the pattern starts at the 25th position and ends at the 27th.")
Use()