379
1. Read the Elder Research and Unilever case studies and note the types of information they were able to extract from the text data. (We don’t quite have that amount of information available, or quite that level of software, so your conclusions won’t be quite as detailed.)
a. Elder Research Inc. (2013). Improving customer retention and profitability for a regional provider of wireless services. Retrieved from http://datamininglab.com/images/case-studies/ERI_nTelos_Customer_Retention_Case_Study.pdf
b. Jiffy Lube Uses OdinText Software to Increase Revenue. http://odintext.com/wp-content/uploads/2015/10/odinText-Shell.pdf . There’s also an interesting interview video here: https://www.youtube.com/embed/2Zxmjir8Zwo?autoplay=1
c. These case studies are for inspiration only; there is nothing to turn in from them.
2. Choose a readily available text item which recurred over at least three time periods, spaced some distance apart. You want to make sure you have at least several hundred words of text (approximately one – two pages) for each time period, and you want something which will change noticeably over that time period. Some options could include:
a. The CEO’s letter to shareholders (I do Amazon.com below as an example – you may choose any other company) in three different years.
b. The State of the Union Address (or other political speeches) from three different Presidents, such as those from Woodrow Wilson, Lyndon B. Johnson, and Barack Obama
c. A writeup of something technical (like descriptions of Motor Trend’s Car of the Year and Finalists) from 1950, 1980, and 2010.
d. Some industry writeup (such as PC Magazine’s best new computers for 1979, 1989, and 1999).
3. Match your timeline to the subject. For political speeches, you will get the best results if they are at least 50 years apart. For faster-changing items (such as the cellular phone user’s manual), you can probably get away with things 10 years apart. You will have a much easier time making good graphs if you give yourself good raw data to work with.
4. Select your desired number of time periods. You must have at least three, and can use as many as you like. (i.e. 2000, 2005, and 2010 would be three time periods.)
5. Take your text items for each year and convert them into a text input file.
6. Run a Python program to determine the top X word count for each year. You will need to determine how many words you are going to use in your analysis; you should probably have somewhere between 5 and 30.
a. You will need to make decisions about stop words.
b. Make sure the bulk of your text processing is done in Python, not using the “search/replace” functions in Excel or Word. Part of this class’ skillset is exposure to Python, and this is how you should do it here.
7. Merge your Python word count data into an input file for Tableau. You may find it helpful to use Excel or some other tool for this.
8. Analyze your data. Emphasis here will be placed on visual analysis and text analysis.
9. As part of your analysis, take the top 3 interesting relevant words from your latest time period. (You can use some judgment here; for Amazon, “kindle” would be more interesting than “book” even if “book” had more occurrences.) Trace the trajectory of each of these 3 words over time – for example, at Amazon, the word “kindle” gains tremendously in popularity over time.
10. Create a managerial report outlining your findings.
A successful report will
· Contain a title page and a list of references and pass a plagiarism check in Turnitin
· Begin with a results-filled Executive Summary (half a page to one page)
· Be otherwise 5-10 pages in length (I will stop reading after page 10). So if you have 1 title page, 1 list of references, and 1 page of Executive Summary, you could turn in up to 13 pages of stuff.
· Contain reasonable typeface and margins (12-point Times New Roman with 1-inch margins work just fine).
· Show mastery of the readings in the class to date
· Showcase your data visualization skills in Tableau, with up to 5 graphs (I will not look at a sixth or further graph)
· Contain a list of your top 3 words, with your reasons for choosing them, and trace their trajectory over time
· Analyze anything else you find interesting
· Integrate your charts with your conclusions
· Give a busy executive a clear path to follow in terms of action items and a “to do” list
Please submit:
· All Python code you used (it’s OK if you simply used or modified the code I have here; just ensure the comments reflect your name and what you did.)
· All .txt text input files you used (I imagine you might have one input file for each year)
· All Excel (or other format) input files you used for Tableau, so I can see how your data shaped up
· Your managerial report (ensure the bibliographic reference include the source data; my report should include a reference to the Amazon.com shareholder letter). Name this file “XXXX-text-analysis” where XXXX is your name.
Good luck!