Okay, so today I’m gonna walk you through my little adventure trying to get some data on Miami Marlins vs. Atlanta Braves matches. Figured I’d share the gritty details, the stumbles, and the eventual wins. Let’s dive in!

First thing I did, naturally, was hit up the internet. I mean, where else would you start? I punched “Miami Marlins vs Atlanta Braves matches” into the search bar. Obvious, right? I just wanted to see what kind of official sources were out there, scores, schedules, the whole shebang.
Next, I started poking around those search results. You know, clicking on links that looked promising. ESPN, *, some sports news sites – the usual suspects. I was mainly looking for tables, or anything that seemed like structured data I could maybe copy and paste into a spreadsheet. I wanted like, dates, scores, and maybe even some game stats if I was lucky.
The problem? Websites these days are all fancy with their javascript loading stuff and dynamic content. Copy-pasting from a webpage just gives you a jumbled mess most of the time. I needed something more robust. That’s when I started thinking about APIs.
So, back to the search engine! This time I searched for “MLB API”. APIs are basically ways to ask a website for data in a structured way. It’s like ordering food at a restaurant instead of trying to raid the kitchen. After some digging, I stumbled upon a couple of potentially useful APIs, but most of them required some kind of signup or even payment. I was trying to do this on the cheap, so I kept looking. It was tedious, lemme tell ya.
Eventually, I found one that seemed reasonably open and had some documentation. It wasn’t super user-friendly, but hey, beggars can’t be choosers. I started messing around with it using `curl` in my terminal. `curl` is a command-line tool that lets you make requests to web servers. It’s kinda geeky, but super useful.

My first few `curl` commands were a disaster. I kept getting error messages, or just gobbledygook back from the server. The documentation wasn’t great, and I was basically guessing at the right URLs and parameters. I spent a good hour just fiddling around, trying different things until I finally got a response that looked like actual data. It was messy JSON, but data nonetheless!
Now that I had the data, I needed to make sense of it. JSON is basically a way to represent data as nested lists and dictionaries. It’s great for computers, but not so great for humans to read. So, I piped the JSON through `jq`, a command-line JSON processor. `jq` lets you filter, transform, and format JSON data. It’s a lifesaver.
I spent another hour or so crafting `jq` commands to extract the specific information I wanted: the dates of the matches, the scores, and the winning teams. It was a lot of trial and error, looking at the JSON structure and figuring out the right paths to the data I needed.
Finally, I had a series of `curl` and `jq` commands that would give me the data I wanted in a relatively clean format. I then wrote a small shell script to automate the process. The script would loop through a range of dates, make the API requests, extract the data, and append it to a CSV file.
It wasn’t pretty, but it worked! I ran the script, and after a few minutes, I had a CSV file with a list of Miami Marlins vs Atlanta Braves matches, along with the scores. I opened the CSV in a spreadsheet program and cleaned up the data a bit, fixing some formatting issues and adding a few extra columns.

And that’s it! From a simple search query to a working data pipeline, I managed to extract the data I needed. It took some time and effort, but it was definitely worth it. The whole process, including the initial searching, the API exploration, the `jq` wrangling, and the script writing, took me about half a day. I have a lot to learn but I will keep practicing!
