Data Science Initiative Introduction To R Bootcamp (part 2)

Specific question we want the factor okay so one of the things we can actually do let’s go back over here and say hey what if I try to make this into a factor now it says sure I’ll go along with that and lo and behold we get our plot okay and now what’s a.

Plus over here so we see this is the number of bedrooms here are the two observations corresponding to our.

Studio or what we think is a studio and the plus means oh that’s not good that’s not good okay if the plot is good the data are not okay this says a plus with zero bedrooms and the price is 25 hundred and it how many baths does it have five okay if this is correct okay so it may be a one.

And a half bath and maybe a half bath I have no idea okay this is probably an error so we’ve already found some dubious data here that we want to go back and check okay we could have found it through.

The commands that we were doing earlier today but applause is way easier to look at then actually because it may suggest things we don’t we don’t know.

About number bedrooms yeah where do we get five baths looks five baths that’s not too bad but there’s more baths than there are bedrooms okay lots more baths and there are bedrooms okay even.

More about than there are bedrooms this is probably the bath thing doesn’t look like it was calculated correctly or reported correctly but because there’s so many many.

Of them it’s probably me who actually got it wrong.

Okay so now we do okay so what are we gonna do next okay so what we did was actually change this to hang on.

A second is that what I typed this is this is in my notes this is what I typed up here what the heck is the difference you can tell could you.

Have to read the whole thing and compare them text by text by text okay this is where it’s like she’d be.

Easier if we actually broke this up into two different pieces but the difference is ordered versus factor.

Okay a factor makes a categorical variable an ordered ordered makes another categorical variable but it actually says that the levels are ordered for example your grades ABC ok they’re ordered they’re not just categories but a is better than B is better than C is better than D okay so we got sometimes we actually so say this person’s is young middle aged folder there’s an ordering to these things but they’re still categorical because we can’t treat them as.

Continuous variables okay so ordered or factor just it doesn’t really matter too much in this particular case okay and now let’s just do this okay so we’ll do the same sort of thing what’s this going to do this just – using colour rather than shape okay everyone with me what’s the difference it’s down here.

If you want to compare this says I’m mapping bath – the shape aesthetic this one says I’m I have to make it a categorical variable this one says I’m mapping color to this so how am I going to get four.

Dimensions so suppose I want to show type as well how do I change this so that I can map so I can also show the type of the rental use on the plot and then we’ll go for broke shape equals level of type levels we’ll just give me back the unique labels okay I need a hundred and seventy-five values rather than the five levels okay the level two that the levels are important in a factor because they aren’t the set of unique values but then we.

Actually need the observations in there that map to.

Those go to those values we need to group them essentially so what’s happening here is it’s stead of going through and looking instead of saying oh you’re a your you’ve got five.

Bathrooms don’t you get a plus you’ve got two bathrooms you get a circle and it just wonders its way through so you see what I’ve got here I’ve got.

A shape here shape equals bath color equals bath so what if I just mix these together by saying shape equals bath and color equals type there’s no red cane so is that that should get us approximately where we want I hope he said okay which is it and again we better be better we’ll make it a factor what was this type oh we don’t need that do we of course what’s the class of Davis dollar type it’s already.

A factor so there’s no need for us.

To actually do that we just say we don’t say no we say well as a shape equals type now and this is one of the very nice things that ggplot does first it just takes care of annoying legends this is a pain to actually do okay but now what we’ve actually got we’ve got we’ve got the plotting character is showing us the is showing us the type of a type of rental use we’ve got how many bands there are by the color and now we can basically see notice by the way.

These are our houses all houses which are which are corresponding to large square foot which is no surprise okay but that but now we’re actually seeing is we are seeing there are some large things over here for one of these circles sorts are less circles what are the what are.

These right now these things.

This is a duplex this is a large duplex with with several bedrooms.

With three bedrooms and so forth okay so now is that plot okay what would you change in that plot the switch in what way would you change the scale I’m gonna run out of space this is as big as I can make it I mean I could change this to be from 2,000 to 3,000 I.

Could change the range but then I’d miss all the other ones this is.

A real pain to actually to zoom in we can be nice I’m gonna have to leave out a bunch of stuff potentially probably how would I do that I could I.

Discrete okay this is you know you have two bedrooms again we we don’t you can’t have.

One point nine six to seven bedrooms okay so you know.

We could actually just jitter these just add random noise to these to these things okay and then that would actually allow us to sort of see them in a slightly different.

Way okay you know so let’s try to do this one there is a function called jitter there you go now there’s another way to do this in.

Ggplot2 this but here we’re mixing regular r with ggplot but this makes sense but you can’t do this if this is a.

Continuous variable or you’re actually now going see the value it’s now over there okay and the other one that’s now over there okay.

But they’re the same value okay that’s not in the spirit of scatter plots but now we actually do get to see a little bit of stuff that I you know it’s it makes it a little clearer so rather than actually changing the scale here to see things we’re kind of just saying we’re utilizing the.

Fact that you know everybody understands that this means that this has one bedroom and if they look here and.

They go like okay how come you have 1.2 bedroom as you go like yeah it’s just a an artifact of the way we’re actually trying to visualize the data but okay but I better be clear to everybody now you actually have a little bit better anything else that you’d like to change pardon oh.

Sorry there we go I just added jitter.

The bedrooms and there’s another way to do this there’s two ways to do this in mgg plot where you can actually tell it to jitter the point or you can actually.

Tell it to there’s another thing called dodging where you actually things that would be would be on top of each other that they just sort of they try to separate themselves which is useful for bars and stuff like that again it’s sort of the more general principle but this is just what jitter of bedrooms actually does is.

You can actually look if we actually look so Hiram’s but look at the first three ok Davis.

Dollar bedrooms again again you just I would love you to get really familiar with this is Davis.

Bedrooms pulls out the bedrooms one square bracket one through three just shows me the first three see you that’s no fun okay which is will just show.

That actually get some case so now there’s the first ten and then if I just do jitter of this it just changes them so you know we’re actually changing what the what big you know what the values are but only but they get thrown away but they’re but for just for the purposes of our computation it just.

Says Oh compute a new variable which is jitter of bedrooms and then use that when you’re making the plot okay anything else you’d like to change I see on DNA what’s wrong with the n/a I particularly like the fact that you had to.

N/a you just go and look at the gray one there are a lot of na s are there’s I think okay so the n/a we probably don’t it’s not appearing so but it is a level in here okay ggplot treats it as a level so okay and.

Likewise with the ordered bath in this case it actually you know we can see the where the n/a for the for the bath is what else would you like to change.

See this is just Bob yeah it would change this guy yeah I would like to change this we saw how to do that earlier which is you call the X lab.

Function so yeah let’s tidy this up okay let’s tidy that up let’s tidy this up potentially and then this just bothers me this is an artifact of the fact because we didn’t have our data in the right format and so it should have been a factor but so I’m a I did it in the command but as a result poor ggplot doesn’t have a name for it so it says I’ll just use the name of the command you use order two bath so we can we can tidy.

Up this again when you’re doing when you’re cleaning data when you’re examining just just.

Checking to see how is this just same did I read the data incorrectly you don’t care about any of the.