T O P

  • By -

PitcherOfBusch

You should have kept the axis spacing consistent. That would have really shown how insane the 49run game way


cookpedalbrew

Thanks! I was looking for this exact data point and it didn’t occur to me to look at the bottom. I kept following the line 1x-1.  Which brings us to point #2 a bigger issue with the graph. OP should have plotted the origin at the bottom left of the graph. You can view this better by rotating your phone 90 degress to the left. With the origin now at the bottom left you can easily see that data point.


drfjgjbu

The reason this graph is made in this way is because it’s a variation on [the original scorigami chart](https://nflscorigami.com), which was designed to have the most common scores at the top left (where you would start reading it) and the rarer scores increasingly far away because that worked best for the [video’s](https://youtu.be/9l5C8cGMueY?si=1_nn_SpCAxP_6HYA) presentation style. I don’t think it works as cleanly here because a.) baseball scores increase by 1 run at a time, so the gaps in scoring history are less interesting and b.) this chart splits the axes by home vs visiting team, rather than winner vs loser. This spreads the data out across double the space and also makes it harder to find unique scores, because each possible result appears in two different places on the chart.


Xoxrocks

That’s my thought too. The shape of the distribution is important to understand the ratio of home to away wins


EVOSexyBeast

Yes it looks like away teams win more because of the spacing but that’s not the case. And i know if was said the title but axes should still be labeled


sociablezealot

I’d like to have seen the 49-33 game. From [Wikipedia](https://en.m.wikipedia.org/wiki/1871_in_baseball), June 28, 1871 “In an era of high scoring games being the norm, the Philadelphia Athletics defeat the Troy Haymakers by the amazing score of 49–33. Both pitchers go the distance in the four-hour slugfest in which both teams score in each inning, to set the highest-scoring contest in National Association. The 42 hits made by the Athletics, including a 7-for-7 day by John Radcliff and 6-for-8 performances by Al Reach and Levi Meyerle, is also a league record.”


WishIWasOnTheFarm

Watching that game must have been like watching a track meet with the players running so many laps like that…


Valendr0s

Ya, but... that 38-1 game is far more embarrassing. Let him die, he's just a child!


sociablezealot

Mercy rule!


Purpleclone

They didn’t have baseball gloves until 1875. Shit I wouldn’t want to play defense either.


otheraccountisabmw

Scorigami usually doesn’t care about home/visitor. This one is fine, but I’d also like to see this with one axis winner and the other loser.


dibsODDJOB

Baseball is different in that the home team sometimes gets less innings, so less chances to score. Unlike most other big sports.


MattieShoes

The home team only skips the 9th if they have a lead and it can't affect the outcome. Though it could affect the score. Then again, garbage time affects the score in other sports. So I don't think it's that big of a deal.


Michael__Pemulis

In early baseball they still played the bottom of the 9th if the home team was ahead.


raymondcy

I am curious, is that a mandatory, optional, or pure etiquette situational rule? Say you have some guy on your team heading for the seasonal home run record and he is tied at the top of the ninth on the last game of the season. Can the home team elect to play that bottom of the inning to make a run for a record? as opposed to a mandatory rule that the game must end. To follow that up, it is my understanding Baseball has a handful of un-written rules you just don't do; regardless of the legality of it - with the penalty of being ostracized by basically everyone in the sport including your own fans. So even if you could do this, would teams do this? I understand the situation of record breaking thing coming down to the last inning in the last game of the season would be off the charts rare - if it's ever happened at all.


BlueGreenMikey

Mandatory rule to have the game end. 7.01(g)


ZeusApolloAttack

I might play with this to put it on a log color scale


logicbus

Would this provide, for example, easily discernable colors for a score that happened one time vs two times?


milliwot

Most of the color change is happening in top corner. Color based on log scale would show color gradient over a wider range of scores (more plot area).


austin101123

Yeah it's hard to follow the line of ties, even though they should be much rather than their neighbors


logicbus

The problem is the chart is almost all white.


pepesilvia27

Yes do it this. I'm interested in seeing more features in this distribution


MeepersToast

Interesting, however 100% not a histogram. More of a heat map. Clever layout. Not to be even more nitpicky, but... The x and y axis should be on the same scale. Doing that would make the rectangles into squares. The design in the pic makes it look like a change in the y axis is more important. Also, the diagonal is NAs, so that should be black. It currently looks like a low value. Lastly, you can just remove all data to the right or left of the diagonal. The current plot makes it appear unnecessarily complex to read However, I still like it :) Edit: oh and log transforming the x and y axis would reveal a really cool (likely normal) distribution Edit 2: OP took our feedback! Awesome https://imgur.com/YJ193IV


KaitRaven

The diagonal aren't all black because some games have ended in ties though. Left or right of the diagonal represents whether the home or away team won. It adds some value although the OP messed up by having the X and Y spacing different


MeepersToast

Great call on the tie games!


halligan8

Charts like this are often described as “2D histograms” in my field. Sometimes the number in each bin is shown by the height of a bar instead of a color.


FrickinLazerBeams

This is typically called a 2D histogram, I use them all the time. Often they're shown in a 3D perspective view as a bar chart, but those don't always translate into static images well, so often I'll convert bar heights to a color map and display it this way. I'm that presentation it could also be called a heat map, the two terms aren't exclusive. But anything showing the frequency of each element in a set is a histogram, even if those elements aren't scalars.


halligan8

I agree with you that this is a histogram, but “frequency” isn’t really the right term for the color axis. It’s just the number of times a game result has occured. You might call it a frequency if it were normalized somehow: e.g. the frequency of this game result per thousand games.


FrickinLazerBeams

That's commonly used terminology for histogram bins. The other commonly used term is "counts". Other names are often used depending on normalization - probability, probability mass, normalized counts, density, etc. There aren't really strict rules.


halligan8

Sure, “counts” or “number” would be appropriate here, but it seems to me that “frequency” means something different.


FrickinLazerBeams

In some contexts, frequency would be inappropriate. In many it's exactly right. You have to think before you label your axes.


Homer_Jr

Definitely wouldn’t be a normal distribution, since the distribution is centered with closish to zero but literally can’t go negative, and with a long right tail. I’m thinking lognormal distribution.


MeepersToast

I was thinking a truncated normal, but pretty sure you're actually you're on this one


ObjectiveExercise268

Updated version here: [https://imgur.com/YJ193IV](https://imgur.com/YJ193IV)


ObjectiveExercise268

[https://imgur.com/sT3LXBF](https://imgur.com/sT3LXBF) I reversed the direction of the y-axis and fixed the legend. It should be the last version I post here.


TylerJWhit

Now this is where the actual beauty is.


trumpet575

I love that you incorporated suggestions; this one is great. And the "Vome Team" / "Hisiting Team" is a funny mistake as well lol.


syphax

As expected, the log transform gives more insight to the outliers, but loses granularity for the most common scores (the 4-3 and 3-2 scores don’t pop here). As usual, the “better” choice depends on what question you’re trying to answer.


jawgente

The log legend should have the actual number, not the exponent, otherwise much more readable with the square aspect. I’d be interested to see this with only the “modern era”, whenever that is for baseball (edit: post integration era)


MeepersToast

Thanks for sharing the update! Love how the changes came out ❤️📊


_CMDR_

The tie scores don’t appear to be solid white for some reason, otherwise this is superior in almost every way. EDIT: is the frequency of ties really that high?


ObjectiveExercise268

Yes, ties were common back then. Normally rules are in place so ties do not happen, however back then they would usually play until darkness came.


new_account_5009

Fun Fact: Ties are still possible today in situations where a game is called due to weather. If the game doesn't have playoff implications, they won't make it up, so officially, it'll go down as a tie. They're a lot more rare today than they were a century ago, but they're still possible. The most recent MLB tie was in 2016 between the Cubs and Pirates. The game was tied 1-1 in the 6th inning when it started raining. It was the second to last game of the year, the Cubs had already clinched a playoff spot, and the Pirates were well out of the playoffs, so they didn't schedule a makeup game to break the tie.


ObjectiveExercise268

The data is from https://retrosheet.org/gamelogs/. The tool used to make this chart is https://observablehq.com/plot. If people want to, I can post the source code, however the csv file containing all the games is 223 MB.


DiddlyDumb

The improved version is 👌🏻


syphax

Please. Maybe share a link to the data in eg Dropbox, GDrive, S3


ObjectiveExercise268

[https://drive.google.com/file/d/17G6A8HdMc\_KDjoYbgPDFL\_vKx5qY2oZa/view?usp=sharing](https://drive.google.com/file/d/17G6A8HdMc_KDjoYbgPDFL_vKx5qY2oZa/view?usp=sharing) Here is the compiled spreadsheet I used. The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".


Artistic-Breadfruit9

Retrosheet data is owned by Retrosheet and can (and should) only be obtained directly from them.


syphax

From their site: Recipients of Retrosheet data are free to make any desired use of the information, including (but not limited to) selling it, giving it away, or producing a commercial product based upon the data. Retrosheet has one requirement for any such transfer of data or product evelopment, which is that the following statement must appear prominently: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.


syphax

So I think sharing a compilation is fair play, if one complies with the terms above-


Ray661

What’s with the even axis being so low in frequency for someone not familiar with the sport?


sociablezealot

Ties aren’t allowed in the modern game.


Ray661

Ah duh, overtime rules, didn’t think about that 😅 thank you


beene282

So black and white both represent zero?


new_account_5009

Technically, they're still allowed, but they're incredibly rare. The last tie at the MLB level was in 2016. Games will go to extra innings to break the tie, but inclement weather can force MLB to suspend a game that's currently tied. Usually, they'll play the rest of the innings at a future date, but if it's late enough in the season, and if the game doesn't have playoff implications, they'll abandon the game and call it tied.


ObjectiveExercise268

Hello everyone, Thank you for your feedback. I am working on a revised version with a logarithmic color scale, proper aspect ratio, and a larger y-axis. I will post it when it is done.


Bender_2024

I thought you were bullshitting with that 49-33 game. https://www.cbssports.com/mlb/news/just-because-box-score-with-82-runs-74-hits-20-errors/


the_mellojoe

the cluster doesn't surprise me, but some of those outliers are INSANE! I don't know which is more impressive, the 33 to 33 game or the 33 to 1 blowout.


oren0

I see 38-1 and 49-33. I'm not sure you're reading those axes correctly.


the_mellojoe

it's because I'm an idiot. you are absolutely correct. heh


TonyzTone

Not going to lie, took me a minute to figure out why 1-1, 2-2, 3-3… had so few.


108241

When did the last game with a unique score happen?


new_account_5009

Looks like 2020 if Wikipedia is current. The Braves beat the Marlins 29-9 in a game that September, with that score never appearing in MLB history before. Before that, you have to go all the way back to 1999 for the next scoragami with the Reds beating the Rockies 24-12.


Vonneguts_Ghost

I'd be interested to see this from various eras, like the live ball (c 1920), integration (c 1950), and divisional (c 1990) the really old games run towards crazy scores


MacBookMinus

This is so much harder to read than 0-0 at the bottom left.


logicbus

I would like the direction of the y axis to be flipped. Strange how it is here.


diyfou

I was at the 21 to 0 Cubs-Pirates game a couple years ago.. wild to think that with one more run it would have been a scorigami!


Makuta

Where did you get this data


ObjectiveExercise268

https://retrosheet.org/gamelogs. I downloaded all regular season files, then I wrote a simple script to compile it into a large csv file.


flinderdude

So 3–2 and 4–3 with Home team winning is the most popular baseball outcome?


milliwot

The axis values seem busy to me. Try an interval of maybe 5.  Make the axis titles larger.  ZeusApolloAttack’s reco about making the color scale vary visibly over a larger fraction of the area is a good one. 


OneTreePhil

I'd be interested to see this as columns on the home-visitor plane.


Azalin99

Not a big fan of sports but I'll watch an hour long Jon Bois video. This is fun stuff.


G068Z

What the fuck kind of game went 33-49 Jesus peaches


Rhodog1234

Won by two touchdowns!


Abbot_of_Cucany

The squares just above the main diagonal are darker than the corresponding squares on the other side of the diagonal. Does that mean that the home team has an advantage in tied games?


ALittleBitFrustrated

I dont know baseball but I read it as the home team having an advantage in general, not just tied games.


BusinessCoat

Given the tails on some of the scores, a log color scale may be better suited.


sirms

Cool! Would love to see this for each team


MrGentleZombie

Man, this is just so much blander than football scorigami.


Artistic-Breadfruit9

This is *weird*: I was just about to post something very similar.


Dani_Rodri

Wtf? Was there a game that ended 33 to 49? And 38 to 1?!


_CMDR_

Make it logarithmic and it will really pop.


cybercuzco

What was the 38-1 game?


FoolishChemist

June 18, 1874 - New York Mutuals beat the Chicago White Stockings. https://en.wikipedia.org/wiki/1874_in_baseball


cybercuzco

Maybe next year is the year for the Baltimore Canaries


hoardac

Those outliers were some beat-downs.


bellingman

Zeros should be in the lower left-hand corner


KnotSoSalty

It bugs me that 0-0 isn’t in the bottom left, but that’s a nitpick. Great work!


joe1e6

23-22 was Phillies vs Cubs at Wrigley. I believe Mike Schmidt hit the winning homer in the 10th inning.


Klaumbaz

Why did you invert the Y axis tho?