Data Science Initiative Introduction To R Bootcamp (part 2)

You wanted to the same enos which w1 because which w1 will take tells me which ones are true and the minus sign says drop those and again this is subsetting the Rose comma not give.

Me everything in the columns okay and now let’s go back over here when we make this plus but now all.

I got to do is just change that notice though that this is convenient I just changed the data set and I could.

Have actually done this command directly in here Davis such the square bracket and you know not w1 and it would have said no problem so I didn’t even have to create a new got a new thing and now I’m.

Just going to know what will happen here okay and the NA s went away here and the NA s went away in the panel down here.

And all and again all we did.

Was subset out the bits we didn’t want and.

Then go ahead and do the plot everyone happy what else do you guys want to talk about I have two things yep the brakes what brakes in what sense so rather than how about we don’t we can trip well we can chain we can control absolutely everything you know but if you really want to specify different tick marks on locations that’s fine but many times we don’t actually want to.

To do something like the following which is I’m just gonna I’m gonna get rid of the yeah so we’re just we’ll just do it this way one of the ways you could do it is.

Which is while in and we’ll sort of say okay it goes between zero and ten thousand okay and now I’ve changed the wild limits okay so we can change the range of the axes by using while in and excellent which are also the same commands for the base plot so this is very handy okay is that what you.

Want to do okay but we haven’t changed where the tick marks if we haven’t controlled the tick marks we still say to ggplot hey go make pretty pretty.

Axis labels you know go take the min and the max divide it equally but don’t do 1500 and to round it to real you know proper numbers and it will take care of that if we want we can go in and control all the tick marks.

As well but them don’t do that okay but by the way if you want to somebody asked me this yesterday yeah if you’re publishing a paper and they want you to actually make if they want you to annotate the plot or they want you to actually control these different things what you can do is you can actually.

Export this as a JPEG or a PNG file and then bring it into some image.

Editor and point-and-click your way to death and.

Make applause and then they come back and out and then so the problem with that is you’re gonna spend an hour or two actually manually modifying this image send it off to the journal or your p.i or whatever it is and then they say you know can you just do it with.

This data instead and you go are you killing me oh yeah just don’t come just gonna Beach at the center so it’s kind of nice again there’s there’s there’s a continuum here and you have to pick you up to fight I identify when to do is but you can programmatically construct all.

Of the same the same thing you have control of everything in our okay so instead of actually manually doing it in an in a in Illustrator or something you can actually go and programmatically do it and then that’s reproducible and that actually is a good thing and you have ultimate control and you’re not just actually sort of you’re not constrained by the resolution of your screen you actually can you actually can be very very specific it’s overkill for most things but it’s it’s a it’s a good alternative it’s for general purposes.

Especially if you’re gonna be and if you know you’re going to end up redoing the same thing over and over again yeah actually not a big problem it’s it’s let me just let me just let me just let me just go over and just.

I just have something lying around the place so I just want to I want to just reuse a sign okay so one of the ways is this sort of thing okay the one thing.

I will say to you is this is error bars what error bars stand there’s no such thing as a standard error bar okay there’s a whole lot of assumptions that go into all of this stuff and by alessio to be.

Careful about whatever bars but yeah you can’t you can do this this is an example of a okay here we’re doing a okay remember just just to practice reading these things weird we’re drawing a scatterplot because it’s John point with square foot and.

Price okay and then we’re gonna color based on the bedrooms okay and then actually what I want to do is I actually want to put a I want to.

Actually put a okay I want to actually.

Smooth line here and because when I put the smooth line on it’ll draw the error bars for me okay so so in some cases you just added you add this you add a JIT you add a jump okay.

This is adding an extra layer which is a smooth geometry you can also actually do this where you can actually group by some group observations and it’ll just it’ll smooth different observations and draw one for a draw a smooth thing for each of the subsets so you have a lot of control over.

This stuff you can actually go over and do something else notice by the way the message came out of here John smooth I’m using method equals low else you’re all familiar with Louis no okay so that’s fine so we go no actually I what I want to do.

Is I want to use a linear model thank you very much okay and now it says sure I can do that for you and now I will actually fit this.

A linear model I’ll put the standard error error bars here so on this type of plot the standard the error bars are is an actual error envelope okay if you have a bar plus first of all you shouldn’t have a bar okay second of all you know now you’ve got counts so what is the error there’s a kind of an implicit assumption that you’ve got a distribution that’s here okay it’s a binomial as.

Across on all these sort stuff so you do want to control this but there are if we go back over to here you know if you go down to a few certs in here and you look for error okay j-john error bar h horizontal error.

Bars and so forth and you can search through all the different jobs that it has.

Error bar okay what’s what is the difference here.

Encourage you is there’s there’s a bunch of controls you can put in for all these jobs like the methods the smoothing is Louis or LM or lo s with the W and nos no W of what the heck’s the difference between those two okay but they are different so that you want to actually understand what the statistics is doing underneath this before you put start throwing down error bars and misleading everybody so anyway this.

One comes from a normal distribution of I’ve got two observations and.

There counts that ain’t normal so you know you took that there.

Are problems with some of this but but it’s quite easy to add these in okay this is what I’m getting a hobby I have no idea this.

Is where you actually have to you actually have to look up the thing now what’s actually happening is that you should look up the help for John smooth John smooth says here’s the mapping well we didn’t specify a mapping sweet inherits the mapping okay then there was a data we didn’t specify a data because it inherited from the GP plot the central plot then we have a stat that’s another matter so these jobs and stats have corresponding the correspondence between them so this is John smooth and stat equals smooth are one.

Computing statistics but in a particular way but then we have all these things now when you see dot dot dot in the.

Help page that’s when you can pass down lots of extra arguments that you could aren’t Nate that.

That aren’t late there aren’t identified here but they’ll probably get passed down to this method okay so we can actually take a look and see and okay other met other arguments passed on to layer.

Okay so so that that they got into layer so it’s not actually controlling the method here but then here we have some information about whether its LM glad and Louis and so forth and then we have to read this health page as to what is getting.

Controlled okay this is one of those cases where it’s you have a lot of control in gp+ but sometimes you don’t have enough and then you actually.

Need to take over and actually do the cut and do the computations yourself but basically it’s not clear to.

Me how we control how low F is going to get computed okay and whether or not we can actually pass down extra arguments the low s by the way remember we were doing the density plot yesterday and we change the bandwidth to make it more and more smooth.

Or less smooth well the same thing with Louis is we’re smoothing now X versus Y rather than just a density.

And we need to be able to control a lot of these different things and move hello so it’s not obvious how we actually control this we can control whether the standard error is specified or not oh boy that was that was fun okay so this is we’re which we’re doing very high level things and one command goes off and generates the lowest curve and all the standard errors well we don’t necessarily get to control.

How that was done and that’s a pain so the standard deviation can be of a.

Population you can compute the standard deviation of a population no problem standard errors is typically the standard deviate and a deviation of your estimator okay so.

It’s an act it’s the it’s how much your statistic parries so that’s fine so so it’s this.