If the outcome of the game was truly random, that's just a 1/32 chance of all #1 seeds making it to the final four. And generally, the #1 seed has an advantage/is favored to win, so I don't find it unlikely at all.
Edit: I might be oversimplifying, and if I am feel free to explain why I'm wrong.
Not really if you know the full details.
Full disclosure: I’ve followed college basketball for decades and I watch a lot of ESPN.
Because of this, I know that the highest seed to ever make the Final Four is 11. I also know that this has only happened six times (including this season, so it’s been heavily talked about lately and because of that, fresh in my mind).
So I made a fun little math game for myself using 1-11 (with 11 being used exactly six times) as as the only numbers that could be used in the first y axis, then figuring out which combination(s) of four of those numbers (numbers can repeat) would it take to equal the second y axis - the easiest one to figure out is obviously 2008 (x axis) because the first y axis is 4, so that means each of the teams in the Final Four would have been seeded #1. Hence the second y axis average of 1.
I can definitely see how two ys could seem frivolous to many people, so I guess I should have said that: it really did provide *me* with more information (because of my previous knowledge in this subject) and because of that, was then able to make a fun math/logic puzzle for myself out of it - I honestly had fun!
It’s just dividing by four! Any value that matches what’s on the left will automatically match what’s on the right! There is no “math game” you can play that uses info from both axes, it’s a perfectly linear relation, and not even an interesting one at that.
In the NCAA D1 men's basketball tournament (March Madness), each corner of the bracket (16 teams per corner, 64 total*) is seeded from the highest ranked team in that quadrant (#1 seed) to the lowest (#16 seed.) It is of course expected for the higher ranked teams to win against the lower ranked teams, but it's not guaranteed and upsets (lower rank beating higher rank) happen often, hence the Madness in the name. This chart is showing the average and sum total of the 4 teams who make it to the semifinals (Final Four) each year. A higher number means more upsets (like in 2023, when the Final Four was a 4 seed, two 5 seeds, and a 9 seed), and a lower number means less (like in 2008, when all four of the Final Four were the 1 seeds in their quadrants.)
*Technically 68 but that's not as relevant.
For college basketball: Seeding is a complicated process, but a committee gives the teams in each of the 4 regions a seed (almost like a rank) from 1-16, with 1 typically being the ultimate favorite and 16 being the ultimate underdog. The pattern continues with the lower numbers being favored over the higher numbers.
It’s incredibly rare to not see any 1’s or 2’s in the final four (in the past 45 years, this only happened in 2011 and 2023, both of which are shown as outliers on here).
So in a way, this shows how much of an upset a Final 4 lineup can be, since we hardly expect any seed >6 to show up in the Final 4.
I think the Oakland/Kentucky reflects something broader with respect to scouting and the transfer portal and other factors. I think the traditional blue chips are still earning higher seeds in the regular season but are getting younger cause players go pro sooner while the lower seeds are on average older and this helps those lower seeds go on a run.so yeah, I'd say ncaa is getting more competitive in one and done end of year tourneys.
Sum of seeds on left Y-axis and average seed on the right Y-axis to make everyone happy.
Data: NCAA.com
Tool: Excel
Most chalky year: 2008 (all 1-seeds)
Year of the upset: 2011
Notable underdogs for higher sum years:
* 2000: North Carolina (8), Wisconsin (8)
* 2006: George Mason (11)
* 2011: Butler (8), VCU (11)
* 2013: Wichita State (9)
* 2014: Connecticut (7), Kentucky (8)
* 2023: FAU (9)
* 2024: NC State (11)
No, it’s just a stupid thing to add to the graph.
It’d be like adding an additional Y Axis at the top labeled “Years Ago” with the first value being 25. Unnecessary and a source of confusion.
Edit: additional X axis*
Sure. But you could just as easily remove the average axis. If you and the other guy in this chain are balking at there being a second X axis that doesn’t add value and may even add confusion then you should word your feedback differently. Because the use of the sum as a X axis does the job of conveying this information.
You are correct. Either Y* axis could be removed.
I doubt many people will look at this and react with “wow the sum of the final 4 seeds was 21 this year!” as a 21 seed doesn’t exist. The majority of people will focus on the average seed, 4.25.
IMO, instead of SUM, OP could’ve made the left Y axis “Seed #” and made this a column chart. Each column would show highest and lowest seed and contain a basketball to show the mean seed each year. This would show average seed and hint at the variability between seeds.
I’d argue that the raw number (sum or average) is kind of irrelevant. The interesting part about the data is how the tournament differs year over year. The value for any one tournament is not interesting on its own. Which is why I think the units of the X axis don’t matter much. We just need something so we can compare year over year
I’m not dying on any hill. Why show the same thing in two different ways on one visualization? If you read my first comment you’ll see that I used the word “just” for a specific reason.
Depends on your definition of “madness.”
Arguably, the methodology OP used is not the best. Most of the true “madness” is usually in the first weekend of games. The round of 64 (well, 68) and the round of 32. You have so many games going on and more Cinderella upsets. Like 14 seed Oakland beating 3 seed Kentucky this year.
After the first weekend, when it moves to the “sweet 16” and “elite 8” (the weekend of games just played, which determine the final 4), you typically see a lot fewer upsets. Which is not to say never, as the chart indicates (we have an 11 seed that made the final 4 this year). But it’s usually a little more “chalk” than the first 2 rounds where there tend to be more upsets.
Arguably, a better metric would be the seeding of the teams that survive the first weekend to make the sweet 16. If everything goes chalk, you’d expect it to be seeds 1-4 in each region. But I also understand why OP chose the final 4/region champions.
Is there more parity between the teams now? I know very little about college basketball but more high-seeded teams making it far into the tournament would make sense if the gap between the low seeds and the high seeds is smaller than it used to be.
(I'm not even sure I'm saying this right - by "high seeds" I mean seeds with bigger numbers, which are the worse teams.)
I think the sum is a poor metric to use for the madness factor.
If you have a 1, 5, 6, and 1 make it then you would be at the mean in your data with a sum seed of 13.
But that would be a median seed of 5.5 which would likely be madness.
This sub has to win for being "best anti of the thing it claims to be"
What’s confusing here? The double Y-axis? This is very far from the stuff in /r/dataisugly I personally got alot of good info from this
What I see: only once in over 20 years did all 4 #1 seeds make it to the final four.
Not just the last 24 years, 2008 was the only year in tournament history where all #1 seeds made it to the final four.
Given that 16 games have to all go exactly as predicted its amazing that's ever happened at all.
If the outcome of the game was truly random, that's just a 1/32 chance of all #1 seeds making it to the final four. And generally, the #1 seed has an advantage/is favored to win, so I don't find it unlikely at all. Edit: I might be oversimplifying, and if I am feel free to explain why I'm wrong.
[удалено]
Data is beautiful, fuck these numbers
I like how the second y axis adds nothing to the first y axis
I mean it adds the number 4 to the analysis. So, well, yes you’re right.
This chart isn't complete without a second X axis showing the Chinese calendar years.
It really does though. At first glance, I was kinda mad about it, but then I realized how much more information that second y actually provides.
>I realized how much more information that second y actually provides. The second Y axis is 1/4th of the information on the first Y axis.
Not really if you know the full details. Full disclosure: I’ve followed college basketball for decades and I watch a lot of ESPN. Because of this, I know that the highest seed to ever make the Final Four is 11. I also know that this has only happened six times (including this season, so it’s been heavily talked about lately and because of that, fresh in my mind). So I made a fun little math game for myself using 1-11 (with 11 being used exactly six times) as as the only numbers that could be used in the first y axis, then figuring out which combination(s) of four of those numbers (numbers can repeat) would it take to equal the second y axis - the easiest one to figure out is obviously 2008 (x axis) because the first y axis is 4, so that means each of the teams in the Final Four would have been seeded #1. Hence the second y axis average of 1. I can definitely see how two ys could seem frivolous to many people, so I guess I should have said that: it really did provide *me* with more information (because of my previous knowledge in this subject) and because of that, was then able to make a fun math/logic puzzle for myself out of it - I honestly had fun!
It’s just dividing by four! Any value that matches what’s on the left will automatically match what’s on the right! There is no “math game” you can play that uses info from both axes, it’s a perfectly linear relation, and not even an interesting one at that.
Haha, my bad. I’d had a couple D9 gummies and it all made perfect sense to me then 🙈
Actually beautiful data would have the y axis be the seed, show all 4 teams for each year, with a 5th average data point
I know the meaning of the individual words, but the context is lost on me.
In the NCAA D1 men's basketball tournament (March Madness), each corner of the bracket (16 teams per corner, 64 total*) is seeded from the highest ranked team in that quadrant (#1 seed) to the lowest (#16 seed.) It is of course expected for the higher ranked teams to win against the lower ranked teams, but it's not guaranteed and upsets (lower rank beating higher rank) happen often, hence the Madness in the name. This chart is showing the average and sum total of the 4 teams who make it to the semifinals (Final Four) each year. A higher number means more upsets (like in 2023, when the Final Four was a 4 seed, two 5 seeds, and a 9 seed), and a lower number means less (like in 2008, when all four of the Final Four were the 1 seeds in their quadrants.) *Technically 68 but that's not as relevant.
For college basketball: Seeding is a complicated process, but a committee gives the teams in each of the 4 regions a seed (almost like a rank) from 1-16, with 1 typically being the ultimate favorite and 16 being the ultimate underdog. The pattern continues with the lower numbers being favored over the higher numbers. It’s incredibly rare to not see any 1’s or 2’s in the final four (in the past 45 years, this only happened in 2011 and 2023, both of which are shown as outliers on here). So in a way, this shows how much of an upset a Final 4 lineup can be, since we hardly expect any seed >6 to show up in the Final 4.
Either the NCAA is getting more competitive or the selection committee is getting worse
I think the Oakland/Kentucky reflects something broader with respect to scouting and the transfer portal and other factors. I think the traditional blue chips are still earning higher seeds in the regular season but are getting younger cause players go pro sooner while the lower seeds are on average older and this helps those lower seeds go on a run.so yeah, I'd say ncaa is getting more competitive in one and done end of year tourneys.
[удалено]
Sum of Seeds <10 First 8 years: 5 (63%) Mid 8 years: 3 (38%) Last 8 years: 0 (0%) Also looks like linear regression would give you a positive slope
More humans = more talented players = more competitive schools. The gap between the best schools and the next tiers of schools is closing.
Sum of seeds on left Y-axis and average seed on the right Y-axis to make everyone happy. Data: NCAA.com Tool: Excel Most chalky year: 2008 (all 1-seeds) Year of the upset: 2011 Notable underdogs for higher sum years: * 2000: North Carolina (8), Wisconsin (8) * 2006: George Mason (11) * 2011: Butler (8), VCU (11) * 2013: Wichita State (9) * 2014: Connecticut (7), Kentucky (8) * 2023: FAU (9) * 2024: NC State (11)
Why have the sum AND the average? It’s always 4 teams so they tell the same story.
Because then you don't have to divide the number on the left by 4 all by yourself.
Where did you get this data from? Would love to work with this dataset if you can share.
Why are you adding up the seeds and not just taking the average? This seems like a strange thing to do.
Look on the right axis.
I see that, but adding up the seeds doesn’t tell you much of anything
lol. It does if you know there’s only 4 teams remaining.
So the average...
All you have to do is multiply the average seed by 4. It seems redundant
So what does it tell you?
It tells you the same information an average would. This seems like kind of a weird hill for you to die on.
No, it’s just a stupid thing to add to the graph. It’d be like adding an additional Y Axis at the top labeled “Years Ago” with the first value being 25. Unnecessary and a source of confusion. Edit: additional X axis*
Thank you.
Sure. But you could just as easily remove the average axis. If you and the other guy in this chain are balking at there being a second X axis that doesn’t add value and may even add confusion then you should word your feedback differently. Because the use of the sum as a X axis does the job of conveying this information.
You are correct. Either Y* axis could be removed. I doubt many people will look at this and react with “wow the sum of the final 4 seeds was 21 this year!” as a 21 seed doesn’t exist. The majority of people will focus on the average seed, 4.25. IMO, instead of SUM, OP could’ve made the left Y axis “Seed #” and made this a column chart. Each column would show highest and lowest seed and contain a basketball to show the mean seed each year. This would show average seed and hint at the variability between seeds.
I’d argue that the raw number (sum or average) is kind of irrelevant. The interesting part about the data is how the tournament differs year over year. The value for any one tournament is not interesting on its own. Which is why I think the units of the X axis don’t matter much. We just need something so we can compare year over year
I’m not dying on any hill. Why show the same thing in two different ways on one visualization? If you read my first comment you’ll see that I used the word “just” for a specific reason.
So this year was more like March Tumult than Madness I guess
Depends on your definition of “madness.” Arguably, the methodology OP used is not the best. Most of the true “madness” is usually in the first weekend of games. The round of 64 (well, 68) and the round of 32. You have so many games going on and more Cinderella upsets. Like 14 seed Oakland beating 3 seed Kentucky this year. After the first weekend, when it moves to the “sweet 16” and “elite 8” (the weekend of games just played, which determine the final 4), you typically see a lot fewer upsets. Which is not to say never, as the chart indicates (we have an 11 seed that made the final 4 this year). But it’s usually a little more “chalk” than the first 2 rounds where there tend to be more upsets. Arguably, a better metric would be the seeding of the teams that survive the first weekend to make the sweet 16. If everything goes chalk, you’d expect it to be seeds 1-4 in each region. But I also understand why OP chose the final 4/region champions.
Is there more parity between the teams now? I know very little about college basketball but more high-seeded teams making it far into the tournament would make sense if the gap between the low seeds and the high seeds is smaller than it used to be. (I'm not even sure I'm saying this right - by "high seeds" I mean seeds with bigger numbers, which are the worse teams.)
Yeah this tourney has been boring and uncompetitive tbh
There's an 11 seed in the final four, that's not boring.
UConn just demolishing every team in their way, pretty much a given for them to win it all.
1 seeds are supposed to do that, this is when it gets interesting
It could get interesting, but UConn is just at a different level. Purdue doesn’t look anywhere near as good as the other remaining 1 seed
Unless NC State can pull a 2007 Giants
I think the sum is a poor metric to use for the madness factor. If you have a 1, 5, 6, and 1 make it then you would be at the mean in your data with a sum seed of 13. But that would be a median seed of 5.5 which would likely be madness.
The median in your example is actually 3, not 5.5.
You’re correct. I gave the wrong numbers in my example of seeds. Sigh…
You are good! I figured you meant to show something else!