Female Discrimination

Since times immemorial in our society it has been documented that females of our population have faced discrimination in one form or the other and this also applies on girls being educated in our society. Thus someone can claim that "Males" gets more opportunity for getting education as compared to "Females". We can have multiple approaches to verify this claim we might give educated arguments and quote scriptures and news stating that this is not the case our society is fair but history tells otherwise we have got data and evidence to support that women were not allowed to go out from their homes for studying and marriage was the only concern of parents. Now an educated man from the 21st century based on his defined moral values, ethics and logic taught to him in schools might claim that our society is fair now and we do not have any such discrimination and life is a fair game, we have leaders who are women we have top level scientists who are women and also we have women in each and every field who have excelled and done better than men. This he can say from the comfort of his luxury home, rich people have choices when a person is faced with dilemma and situations then his ethics, values and moral principles can come under question. Consider the scenario of a lady who is a housemaid and has lost her husband and has got two children one boy and one girl. She can afford to send only one child to school what should she do send the boy or send the girl, now ethics, moral principles that are there in literature comes into question and limelight. Anyways this argument can go on and on and we might keep going on circles but one thing both the protagonist and the antagonist would agree upon that might end this debate and that is data. Still a snapshot of the data at a certain point of time might not be a sufficient evidence and the data itself might be questioned for biases that I assume might be true but the world is not as black and white as it seems today.  So lets get into some scientific discussions or to be more specific statistical discussion that might shed some light on our dilemma.

Some Interesting Data:

A documentation for noting the development of people in India exists and the report is called as Indian Human Development Report. It has got something called as Human Development Index which is a statistic composite index of life expectancyeducation (mean years of schooling completed and expected years of schooling upon entering the education system), and per capita income indicators. This is used to rank countries into tiers of human development. United States of America has got an HDI of >0.950 based on wikipedia and some places in africa has got an HDI of less than 0.399. India stands somewhere in the middle. HDI is geometric mean of Life Expectancy Index, Education Index and Income Index. Here for the purpose of our analysis considering the availability of the data that is opensource we are referring to the Indian Human Development Report, 2011 Towards Social Inclusion by Institute of Applied Manpower Research, Planning Commission. In the data.gov.in portal made available to us by our government we have a dataset net-attendance-ratio-primary-level-social-groups-urban-2007-08. Which I have referred for this study. 

Fig 1: The dataset that has been imported in google sheet
The above figure 1 shows the data that has been imported into google sheet by us has got different states, category of states and the attendance percentage for different backward classes categorised by males and females. The only question that we have now is how to verify our claims that we have boldly made at the start of this article. Do we have any tool? the answer goes as yes hypothesis testing using statistics. we have got certain tests like students t-test, anova(analysis of variance) etc. That can help us verify our claims and the sample size can be anything we do not require huge amount of data normally distributed to make effective conclusions there goes the beauty of these methods.  Now if there is any discrimination as such we should expect the mean of males and females for scheduled tribes to be different. Thus we define our null and alternate hypothesis.
H0:  There is no difference between the Male and the Female attendance percents for ST group
HA:  There is a difference between the Male and the Female attendance percents for ST group.

I imported the data manually in R and assigned category of M for male and F for female, let us use R to run an ANOVA test and determine which one of the two hypothesis is correct: 
Fig 2: Results of running ANOVA test

The p-value comes out to be 0.75 and this is lot greater than 0.05 and hence we conclude our null hypothesis statistically which comes out that there is no difference between male and female attendance percent. 

A Twist in the tale
Fig 3: Attendance Plots for ST  
The way of studying box plot is that the central line is the median value, region above the lower boundary of the solid box is Q3 and the region below the upper solid boundary is Q1, the whiskers gives max and min value for the population. Let us note some interesting points on figure 3 we are able to see median value of female attendance to be less than that of male also interquartile range is high for Females as compared to male indicating that there exists many females whose attendance is much lower than that of males if we consider the Q3 values. There are two outliers for males but minimum value goes somewhere near 45 whereas considering the minimum value of females it can come under 20%. Then the question that naturally arises is why hypothesis testing is pointing towards something else. Notice the Q1 value for female lot of observations fall above 85 which is not so for males that might explain(that is what I assume) why mean value difference as such on this sampled population is not significant. Let us now turn our attention towards SC group.
Fig 4: Box plot for SC group of males and females
   
Fig 5: Results of ANOVA test for SC group
From figures 4 and 5 we get a similar conclusion but on a smaller level. Thus, it might be safe to say although on the whole we do not have such discrimination statistically but certain fraction of the population might be facing this problem. Thus we end our analysis on a positive note.
 
                                                 







     

Comments

Popular posts from this blog

Cheating Detector using shell scripting