Domain: Sports
Context: La Liga is the men's top professional football division of the Spanish football league system. The dataset contains information on all the teams that have participated in all the past tournaments. It has data about how many goals each team scored, conceded, how many times they came within the first 6 positions, how many seasons they have qualified, their best position in the past, etc. Data Description: Laliga.csv - The data set contains information on all the teams so far participated in all the past tournaments
Attribute Information: Pos - Position in among the list of all teams Team Seasons - how many seasons team has played so far Points - total number of points of the team GamesPlayed - total number of games played so far GamesWon - total number of games won so far GamesDrawn - total number of games drawn so far GamesLost - total number of games lost so far GoalsFor - total number of goals by the team GoalsAgainst - total number of goals against the team Champion - total number of times it team is a champion Runner-up - total number of times it team is a runner-up Third / Fourth/ Fifth/ Sixth - total number of times it team came in a third/fourth.fifth/sixth position Debut - debut year BestPosition - best position of the team
Objective: We want to use statistical techniques to come up with metrics with which can be used to gauge the winning team in the upcoming La Liga cup (Football tournament)
# importing libraries
import numpy as np
import pandas as pd
# reading data set from provided CSV to a dataframe 'laliga' using pandas
laliga = pd.read_csv('Laliga.csv')
# printing head of the dataframe 'laliga'
laliga.head()
Pos | Team | Seasons | Points | GamesPlayed | GamesWon | GamesDrawn | GamesLost | GoalsFor | GoalsAgainst | Champion | Runner-up | Third | Fourth | Fifth | Sixth | T | Debut | Since/LastApp | BestPosition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Real Madrid | 86 | 4385 | 2762 | 1647 | 552 | 563 | 5947 | 3140 | 33 | 23 | 8 | 8 | 3 | 4 | 79 | 1929 | 1929 | 1 |
1 | 2 | Barcelona | 86 | 4262 | 2762 | 1581 | 573 | 608 | 5900 | 3114 | 25 | 25 | 12 | 12 | 4 | 6 | 83 | 1929 | 1929 | 1 |
2 | 3 | Atletico Madrid | 80 | 3442 | 2614 | 1241 | 598 | 775 | 4534 | 3309 | 10 | 8 | 16 | 9 | 7 | 6 | 56 | 1929 | 2002-03 | 1 |
3 | 4 | Valencia | 82 | 3386 | 2664 | 1187 | 616 | 861 | 4398 | 3469 | 6 | 6 | 10 | 11 | 10 | 7 | 50 | 1931-32 | 1987-88 | 1 |
4 | 5 | Athletic Bilbao | 86 | 3368 | 2762 | 1209 | 633 | 920 | 4631 | 3700 | 8 | 7 | 10 | 5 | 8 | 10 | 49 | 1929 | 1929 | 1 |
#checking shape of the 'laliga' dataframe
laliga.shape
(61, 20)
#checking data types of all columns
laliga.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 61 entries, 0 to 60 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Pos 61 non-null int64 1 Team 61 non-null object 2 Seasons 61 non-null int64 3 Points 61 non-null object 4 GamesPlayed 61 non-null object 5 GamesWon 61 non-null object 6 GamesDrawn 61 non-null object 7 GamesLost 61 non-null object 8 GoalsFor 61 non-null object 9 GoalsAgainst 61 non-null object 10 Champion 61 non-null object 11 Runner-up 61 non-null object 12 Third 61 non-null object 13 Fourth 61 non-null object 14 Fifth 61 non-null object 15 Sixth 61 non-null object 16 T 61 non-null object 17 Debut 61 non-null object 18 Since/LastApp 61 non-null object 19 BestPosition 61 non-null int64 dtypes: int64(3), object(17) memory usage: 5.5+ KB
# using replace function to replace '-' with '0' which will allow us arithamatic operations
laliga.replace('-',0,inplace=True)
laliga
Pos | Team | Seasons | Points | GamesPlayed | GamesWon | GamesDrawn | GamesLost | GoalsFor | GoalsAgainst | Champion | Runner-up | Third | Fourth | Fifth | Sixth | T | Debut | Since/LastApp | BestPosition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Real Madrid | 86 | 4385 | 2762 | 1647 | 552 | 563 | 5947 | 3140 | 33 | 23 | 8 | 8 | 3 | 4 | 79 | 1929 | 1929 | 1 |
1 | 2 | Barcelona | 86 | 4262 | 2762 | 1581 | 573 | 608 | 5900 | 3114 | 25 | 25 | 12 | 12 | 4 | 6 | 83 | 1929 | 1929 | 1 |
2 | 3 | Atletico Madrid | 80 | 3442 | 2614 | 1241 | 598 | 775 | 4534 | 3309 | 10 | 8 | 16 | 9 | 7 | 6 | 56 | 1929 | 2002-03 | 1 |
3 | 4 | Valencia | 82 | 3386 | 2664 | 1187 | 616 | 861 | 4398 | 3469 | 6 | 6 | 10 | 11 | 10 | 7 | 50 | 1931-32 | 1987-88 | 1 |
4 | 5 | Athletic Bilbao | 86 | 3368 | 2762 | 1209 | 633 | 920 | 4631 | 3700 | 8 | 7 | 10 | 5 | 8 | 10 | 49 | 1929 | 1929 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
56 | 57 | Xerez | 1 | 34 | 38 | 8 | 10 | 20 | 38 | 66 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2009-10 | 2009-10 | 20 |
57 | 58 | Condal | 1 | 22 | 30 | 7 | 8 | 15 | 37 | 57 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1956-57 | 1956-57 | 16 |
58 | 59 | Atletico Tetuan | 1 | 19 | 30 | 7 | 5 | 18 | 51 | 85 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1951-52 | 1951-52 | 16 |
59 | 60 | Cultural Leonesa | 1 | 14 | 30 | 5 | 4 | 21 | 34 | 65 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1955-56 | 1955-56 | 15 |
60 | 61 | Girona | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2017-18 | 2017-18 | 9 |
61 rows × 20 columns
# converting values of 'Debut' column into string datatype
laliga['Debut'] = laliga['Debut'].astype(str)
# getting team details to new dataframe 'Debut Year' based on Debut in between 1930 to 1980 (including 1930 but excluding 1980)
Debut_Year = laliga[laliga['Debut'].str[:4].between('1930','1980')]
# printing team name and debut year from 'Debut_Year' dataframe
Debut_Year[['Team','Debut']]
Team | Debut | |
---|---|---|
3 | Valencia | 1931-32 |
5 | Sevilla | 1934-35 |
8 | Zaragoza | 1939-40 |
9 | Real Betis | 1932-33 |
10 | Deportivo La Coruna | 1941-42 |
11 | Celta Vigo | 1939-40 |
12 | Valladolid | 1948-49 |
14 | Sporting Gijon | 1944-45 |
15 | Osasuna | 1935-36 |
16 | Malaga | 1949-50 |
17 | Oviedo | 1933-34 |
18 | Mallorca | 1960-61 |
19 | Las Palmas | 1951-52 |
21 | Granada | 1941-42 |
22 | Rayo Vallecano | 1977-78 |
23 | Elche | 1959-60 |
25 | Hercules | 1935-36 |
26 | Tenerife | 1961-62 |
27 | Murcia | 1940-41 |
28 | Alaves | 1930-31 |
29 | Levante | 1963-64 |
30 | Salamanca | 1974-75 |
31 | Sabadell | 1943-44 |
32 | Cadiz | 1977-78 |
34 | Castellon | 1941-42 |
37 | Cordoba | 1962-63 |
39 | Recreativo | 1978-79 |
40 | Burgos CF | 1971-72 |
41 | Pontevedra | 1963-64 |
46 | Gimnastic | 1947-48 |
49 | Alcoyano | 1945-46 |
50 | Jaen | 1953-54 |
52 | AD Almeria | 1979-80 |
54 | Lleida | 1950-51 |
57 | Condal | 1956-57 |
58 | Atletico Tetuan | 1951-52 |
59 | Cultural Leonesa | 1955-56 |
# copying 'Team' and 'Points' column to new dataframe 'laliga_sort'
laliga_sort = laliga[['Team','Points']].copy()
# converting values of 'Points' column into int datatype
laliga_sort['Points'] = laliga_sort['Points'].astype(int)
# sorting the dataframe 'laliga_sort' based on 'Points' value
laliga_sort.sort_values(by='Points', ascending=False, inplace=True)
#printing head as top 5 teams in terms of points
laliga_sort.head(5)
Team | Points | |
---|---|---|
0 | Real Madrid | 4385 |
1 | Barcelona | 4262 |
2 | Atletico Madrid | 3442 |
3 | Valencia | 3386 |
4 | Athletic Bilbao | 3368 |
# converting values of 'GoalsFor' and 'GoalsAgainst' column into int datatype
laliga['GoalsFor'] = laliga['GoalsFor'].astype(int)
laliga['GoalsAgainst'] = laliga['GoalsAgainst'].astype(int)
# defining function 'Goal_diff_count()' as per the problem statement needs
def Goal_diff_count():
laliga['Goal_diff_count'] = laliga['GoalsFor']-laliga['GoalsAgainst']
return laliga[['Team','Goal_diff_count']]
# calling 'Goal_diff_count()' function and storing it to 'Goal'
Goal = Goal_diff_count()
# sorting data and printing data from 'Goal'
Goal.sort_values(by = 'Goal_diff_count',ascending=False)
Team | Goal_diff_count | |
---|---|---|
0 | Real Madrid | 2807 |
1 | Barcelona | 2786 |
2 | Atletico Madrid | 1225 |
4 | Athletic Bilbao | 931 |
3 | Valencia | 929 |
... | ... | ... |
27 | Murcia | -385 |
19 | Las Palmas | -399 |
14 | Sporting Gijon | -399 |
12 | Valladolid | -413 |
13 | Racing Santander | -525 |
61 rows × 2 columns
# new column 'Goal_diff_count' added to dataframe so shape is changed from (61, 20)to (61, 21)
laliga.shape
(61, 21)
# conforming addition of new column to dataframe by printing head
laliga.head()
Pos | Team | Seasons | Points | GamesPlayed | GamesWon | GamesDrawn | GamesLost | GoalsFor | GoalsAgainst | ... | Runner-up | Third | Fourth | Fifth | Sixth | T | Debut | Since/LastApp | BestPosition | Goal_diff_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Real Madrid | 86 | 4385 | 2762 | 1647 | 552 | 563 | 5947 | 3140 | ... | 23 | 8 | 8 | 3 | 4 | 79 | 1929 | 1929 | 1 | 2807 |
1 | 2 | Barcelona | 86 | 4262 | 2762 | 1581 | 573 | 608 | 5900 | 3114 | ... | 25 | 12 | 12 | 4 | 6 | 83 | 1929 | 1929 | 1 | 2786 |
2 | 3 | Atletico Madrid | 80 | 3442 | 2614 | 1241 | 598 | 775 | 4534 | 3309 | ... | 8 | 16 | 9 | 7 | 6 | 56 | 1929 | 2002-03 | 1 | 1225 |
3 | 4 | Valencia | 82 | 3386 | 2664 | 1187 | 616 | 861 | 4398 | 3469 | ... | 6 | 10 | 11 | 10 | 7 | 50 | 1931-32 | 1987-88 | 1 | 929 |
4 | 5 | Athletic Bilbao | 86 | 3368 | 2762 | 1209 | 633 | 920 | 4631 | 3700 | ... | 7 | 10 | 5 | 8 | 10 | 49 | 1929 | 1929 | 1 | 931 |
5 rows × 21 columns
# printed first entry of sorted dataframe by calling 'Goal_diff_count()' function
Goal_diff_count().head(1) # max goal difference
Team | Goal_diff_count | |
---|---|---|
0 | Real Madrid | 2807 |
# printed last entry of sorted dataframe by calling 'Goal_diff_count()' function
Goal_diff_count().tail(1) # min goal difference
Team | Goal_diff_count | |
---|---|---|
60 | Girona | 0 |
# converting values of 'GamesWon' and 'GamesPlayed' column into int datatype
laliga['GamesWon'] = laliga['GamesWon'].astype(int)
laliga['GamesPlayed'] = laliga['GamesPlayed'].astype(int)
# calculate winning percentage for each team and storing value to new column 'Win Per'
laliga['Win Per'] = (laliga['GamesWon']/laliga['GamesPlayed']) *100
# replacing Null values with 0%
laliga['Win Per'].fillna(0,inplace = True)
# printing team name and winning percentage for each team
laliga[['Team','Win Per']]
Team | Win Per | |
---|---|---|
0 | Real Madrid | 59.630702 |
1 | Barcelona | 57.241130 |
2 | Atletico Madrid | 47.475134 |
3 | Valencia | 44.557057 |
4 | Athletic Bilbao | 43.772629 |
... | ... | ... |
56 | Xerez | 21.052632 |
57 | Condal | 23.333333 |
58 | Atletico Tetuan | 23.333333 |
59 | Cultural Leonesa | 16.666667 |
60 | Girona | 0.000000 |
61 rows × 2 columns
# new column 'Win Per' added to dataframe so shape is changed from (61, 21) to (61, 22)
laliga.shape
(61, 22)
# printing top 5 teams with highest winning percentage
laliga[['Team','Win Per']].head(5)
Team | Win Per | |
---|---|---|
0 | Real Madrid | 59.630702 |
1 | Barcelona | 57.241130 |
2 | Atletico Madrid | 47.475134 |
3 | Valencia | 44.557057 |
4 | Athletic Bilbao | 43.772629 |
# converting values of 'Points' and 'BestPosition' column into int datatype
laliga['Points'] = laliga['Points'].astype(int)
laliga['BestPosition'] = laliga['BestPosition'].astype(int)
# grouping teams based on 'BeastPosition' column
group_Best = laliga[['Team','Points','BestPosition']].groupby('BestPosition')
# computing sum of grouped values on 'BestPosition' and print them
group_Best['Points'].sum()
BestPosition 1 27933 2 6904 3 5221 4 6563 5 1884 6 2113 7 1186 8 1134 9 96 10 450 11 445 12 511 14 71 15 14 16 81 17 266 19 81 20 34 Name: Points, dtype: int32
Completed by: Ganpat Patel Email: ganpat.patel.012@gmail.com