Okay, so I’ve been wanting to mess around with the lyrics from Hamilton, you know, the musical? I thought it would be cool to do some text analysis, maybe find out the most common words, or see how the vocabulary changes throughout the show. So, I decided to get my hands on the script and play around with it.

Getting the Script
First things first, I needed to get the actual words. I did a quick search online, but surprisingly, finding a clean, ready-to-use text file of the entire script wasn’t as easy as I thought. I found some websites with the lyrics, but they were all formatted for reading, not for data analysis. You know, with line breaks and character names and all that stuff.
So, I ended up copying and pasting the lyrics from one of these websites. Yeah, it was a bit tedious, but it worked. I pasted it into a plain text editor to get rid of any weird formatting.
Cleaning Up the Mess
Next up was cleaning the data. This is where it got a little messy. I had to manually go through the text and remove things like:
- Character Names: Every time a character started speaking, their name was there. Had to get rid of all those “HAMILTON:” and “BURR:” entries.
- Stage Directions: Stuff like “[ENSEMBLE]” or “[COMPANY enters]” had to go. I just wanted the words they were singing/saying.
- Punctuation: I decided to remove all punctuation, like commas, periods, question marks, etc. I figured it would make the analysis simpler, focusing just on the words themselves. I might change my mind about this later, though!
- Line break: Since it’s not ready for analysis so I need to delete all the line breaks.
I basically used find and replace, and a lot of manual checking. It wasn’t pretty, but it got the job done. I’m sure there are fancier ways to do this with code, but I’m just starting out, so I stuck with what I knew.
The Result (So Far)
After all that cleaning, I finally had a text file with just the words from the Hamilton script. It’s not perfect, but it’s a good starting point. My next step is to actually start analyzing the text. I’m thinking of using some simple Python scripts to count word frequencies and stuff like that. I’ll keep you posted on how that goes!

It was definitely a bit more work than I expected just to get the data ready, but hey, that’s often the case with real-world projects, right? You gotta get your hands dirty sometimes!