How The Texas Rangers Generate Advantage With Big Data

How The Texas Rangers Generate Advantage With Big Data
Kandinsky meets a laptop at a Baseball Stadium. [prompt: a sunny day looking over my laptop at the baseball stadium. Green grass and brown dirt sparkle in the sun in this style https://s.mj.run/0KIXC9g0OKo --ar 16:9]

I attended "Big Data in the Fall Classic - How the Texas Rangers gained a competitive edge through big data" and I want to share with you what I learned.

The conversation was with two members of the Texas Rangers R&D team: Alexander Booth, the Assistant Director of R&D and Oliver Dykstra, a Data Engineer. They discussed stakeholders, the world series, data, AI and the tech stack they use.

Research and Development teams across the league range from 5 to 40 people - Booth says that just about every team has some data presence, but there are disparities based on team size or market share, and that he feels the Rangers are "somewhere in the middle". They are coming off a fresh World Series win, and it seems like the team is growing and doing exciting stuff - definitely looks like a fun time to be in Texas R&D.

This year we played Baltimore in the American League Division Series and the first couple of games were in Baltimore...they changed their stadium a few years back and they pushed the left field fence back which makes it harder to hit home runs to left field, except we knew because of the weather that we'd have a better than normal chance of hitting home runs to left field. We bumped players up in the lineup..depending on who the pitcher was..our recommendations made their way to the field, and of course, we won those first two games.

The team talked about the importance of data quality and trust and described their role in the organization as a kind of "Data as a Service" team. Just like any data team, they have stakeholders - managers, coaches, scouts, player development, sport science, executives.

What does a baseball data organization look like?

Service Oriented Architecture for the Texas Rangers R&D team

The Texas Rangers R&D department runs as a combination of data distributor and data collector. In addition to ingestion of external or game play data, a large number of reports are generated internally from amateur, international, and advance scouts. The team says that analytics is just one part of the whole decision making engine and that domain expertise from the scouts is a crucial component as well. They even have a bio-mechanist on staff now.

Baseball is a very text rich field with thousands of scouting reports written across amateur and pro games across the country.

The core tenet behind the research department is that "availability leads to innovation/disruption." They have adopted a mindset of collaboration and transparency and it is already breeding trust with the new manager. Bochy (the new skipper) called for a consult about the new minimum three-batter rule for relievers - Bochy is a well known lover of single at-bat relievers, and holds the record for the most single-batter usage during his time as manager of the San Francisco Giants [ref]. The changes to the rule meant that conventional wisdom about relief strategy needed to be adjusted and some conclusions needed to be inferred and hypotheses checked - a perfect job for a data team with access to mounds of baseball data.

What kind of data is available to teams?

The Rangers ingest terabytes of data daily and it represents play tracking, ball tracking, bio-mechanical marker-less player motion capture, bat and swing speed, run speed, and many characteristics of the ball such as velocity and spin rate. They even keep track of the wind speeds. Much of it comes from official sources like MLB, some from external websites or sources, and some is internally produced like scouting reports.

They also collect opt-in player data from devices ranging from ora rings to fit bits to force plates in the weight room. Sleep, diet, workout data, and blood pressure are all collected. The numbers show that less than 4 - 6 hours of sleep will result in a worse performance the next day with a high degree of confidence.

Fun Fact: Madison Bumgarner wore a blood pressure monitor β€” but unlike a normal human, when he stepped on the mound to pitch, his heart rate actually went down.

Perhaps the most ubiquitous system is the Hawkeye cameras that track player and ball movements around the field. Here's a diagram they shared illustrating the setup in Texas:

Player tracking system installed at all MLB parks. "Server Room" is my favourite part.

These systems and ones like them are not just in major league ballparks, but "permeate through the minor leagues and amateurs." The next generation of players will grow up with highly available baseball data and the R&D team reports that newer players such as Josh Young from Texas Tech are exiting college and looking for specific metrics and reports that they were exposed to in their amateur games.

Interesting Results from Rangers Baseball Analysis

The Rangers are deep in the "launch angle revolution" that is happening across the league. They had a hype slide showing the optimal "sweet spot" for a hit along with a picture of Adolis Garcia, who is the most recent American League Championship Series MVP and hit home runs in four consecutive games of that series.

Over 60 mph off the bat and between 25 and 30 degrees translates to approximately an 80% chance of getting a hit

I took a look at Adolis' launch angle graph on baseball savant and he is comfortably in that 25-35 range. His exit velocity is well above 65 mph though. πŸ˜„

Adolis Garcia seemed dialled in to the "sweet spot" in 2023. [ref: savant] Color is SLG.

The Rangers staff also showed off this interesting graphic that they described as a new statistic called "Pitcher Deceptiveness." It seems to be based on the ball not being visible as it comes out of the pitchers hand

Visualization: Pitcher Deceptiveness

Though they didn't show it off at all today, Booth also mentioned that he thought generative AI and LLMs would become increasingly important. In another webinar, he described how the Rangers are using AI "as a co-pilot for training data” [ref] - in order to automate some tedious processes like labelling data.

Availability Leads to Innovation

The main takeaway for me personally was that the Rangers are taking data ingestion and quality seriously. Part of the reason I wanted to attend this webinar was because a data engineer would be there. I think data engineering is a massively important part of a modern agile data organization. Here is what Oliver told me about their tech stack:

The Data Stack

  • Multi-cloud (GCP and AWS)
  • Airflow to orchestrate ingestion (actually looks like they use astronomer)
  • Data arrives through FTP
  • Web scraping, APIs, Fangraphs
  • Everything gets in to the lake, lake house is in databricks
  • Transformations, models built in databricks 

Booth added that they also use these Reporting/Dashboarding Tools:

  • Heavy into tableau on the business and baseball side
  • "Javascript Visualizations"
  • Shiny, Flask, Django applications

Rangers R&D feels like a startup - agile, growing quickly, and not afraid to use whatever technology gets the job done. I am inspired by the company culture that is captured perfectly by this:

β€œIn my opinion, if anyone in the Rangers has a question that can be answered with data, they should be empowered to go find those answers themselves."

Thanks to the Texas Rangers for sharing and participating in these webinars, and thanks to WareCorp for hosting.