Summary Statistics With Two Variables – R For Economists Basics 9

First variable this time let’s say we want to look at the relationship between oh I don’t know whether you’re married and whether what.

Else we got here what can we look at whether you’re married or whether you’re in.

A major metropolitan area an SMS a so let’s say we got to clear the table looking at the correlate at the cross tabulation of being married and being in an SMS a so if.

We do that it will show us the tag cross tabulation so on the Left we got 0 & 1 so that’s showing us the if you’re married or not on the top we have SMS a 0 or 1 now this unfortunately this is.

One of my pet peeves in here is that the table functions so nice but it doesn’t actually show you which variables on which.

Do have to to do a little bit of.

Monkeying around here to get some some labels on these axes so that it’s a little bit easier to read and now if we did know how to do this we would look of course in the help file for table and it would.

Tell us how to do this and in particular tells us to use the DNN or the dimensions right there so we DN em so I want you to label this marry it for the first one and SMS a for the second one if we rerun that now it’ll say okay.

Now we got married over here on the Left we got SMS a on the top so we know for example there are a hundred and sixty people who are not married but are in and SMS a we have a hundred people who are married but not in an SMS a we have a hundred twenty people who are both married n in an SMS a and forty six people who are neither and if you wanted.

To turn this into a percentage table there are different ways you can do that I’d recommend going on Google to figure out how to do that okay so we have gotten our correlations we have our tables we can also do a t-test to do test for the Equality of means between two different variables now there aren’t any.

Really variables in this data set that makes sense to do with this so we’re going to sort of do one at random let’s see if the average number of years of education or tenure job tenure is equal to the average number of years of job experience and we wouldn’t really expect them to be the same because if you’ve had a job before you’d have more experience than tenure at your current.

Job but let’s check it anyway so the syntax is going to be the same as – as before we’re gonna check for the difference of means between two variables so new t-test just like we did before a lot of these commands you’ll notice our similar commands to what we did before but.

This time we’re putting in two variables and it just sort of knows what to do so this time we’re doing wage one experience and we’re.

Going to chat again remember t-test this check is testing something against zero so we’re going to look for the difference between these two and check whether that difference is zero if the difference is zero then they’re the same right so we’re gonna do experience – wage one dollar sign tenure.

That this will give us all the same stuff we got from t-test last time and we this time we do have a p-value that’s very very small showing us that the means of these two variables are in fact very different the confidence interval of that difference is between 11 and 13 pretty big difference and it gives us the average difference.
The two okay so so far we’ve looked.

For correlations we’ve done cross tabulations and we’ve also done T tests testing the difference of means of two different variables so what is next so after this what we’re gonna do is we’re gonna ask a question okay well maybe we want to get means within groups like it like.

For example we did that cross tabulation maybe what we’re interested is not necessarily how many people are in those groups we’re interested in say the average wage within those groups are married people in SMS a is getting higher wages than non married people in SMS AS right that’s an interesting question now we’re gonna do that with the.

Aggregate command what aggregate does is it basically takes a data set or a data frame and it says okay well what I want to aggregate this over.

Maybe I want to aggregate it over gender and so I would say okay I want to calculate the average wage say for people with female equals zero and again people with.

Female equals monster takes that takes the female equals zero people just pumps them together takes the female equals one people clump something together that’s what we’re gonna do here so we’re gonna get average wage by marital status.
Okay we do it the aggregate command now with.

The diary command the first thing that it’s gonna take is something called a formula there’s something we’re going to come back to again and again basically what we’re saying here is we want to know the average wage by marital status and so we’re sort of modeling wage as a function of marital says we’re saying hey wage might.

Be different depending on your marital.

Status right so we’re gonna say that wage follows a function of that that’s.

The tilde there that’s that that particular character is up should be up on the top left of your keyboard up above the tab key and we’re going to do as a function of marital status being whether you’re married or not I need to tell it what dataset I’m working with some data is equal to wage one and finally need to tell it what function I want it to calculate.

It knows that I want to aggregate over marital status but it doesn’t know what I want to do with wage so I want.

To average it or I’m gonna take the median what.

Going to say function they go oh capital F um for lots of fun and also.

Function is the mean alright want.

To take the mean of wage over different marital statuses I do that it will tell me a you’re not married your average wage is 4.8 if you are married your average wage is 6.6 apparently I made a good decision now maybe we don’t want.

To do that we want to get more detailed than that cuz remember the original question I had is our people in SMAS who are married getting more money than people who are in SMEs but not married so I need to do it.

Not just over marital status but over marital.

Says and SMS a conveniently it’s pretty easy to extend our function all we got to do is say well.

We want to do it over marital status and SMS a well what’s another word for + + so I’m gonna do it married + SMSF and this will.

Give it over all four categories right if you’re not married and you’re not in the SMS a your wage is 3.9 if you are marry but not in them SMS a it’s 5.3 if you’re not married but in an SMS a it’s 5.1 and if you’re both married in an SMS a it’s 7.2 all right so what we’ve done is we’ve looked at a couple.

At two variables together and how they vary together when.

We look at a correlation we know how to calculate a correlation between two variables we talked about how to look at a cross tabulation of two discrete variables so if I want to look at all the different values of one.

Variable and all the different values of another variable and see how many people fall into each of those cells I can do a cross tabulation with a table we talked about how to do a t-test testing the difference of means of two different variables and.

Finally we talked about how to aggregate our data set over different values so in this case we said you know we aggregated our data over married.

We calculated the average wage within those two groups or within the different four groups of married versus s MSA so we sort of stuck our cross tabulation in there and we calculated the mean in each of those cells alright so that’s it for summary statistics in.

The next video we’re going to start talking about some other really.

Good ways of looking at your data in particular plotting and graphic which is of course an indispensable part of any economy attrition.