Data Science Initiative Introduction To R Bootcamp (part 2)

That’s a good sensible default it’s again that’s the middle of our data set okay so I just put these in if you don’t specify where to start from I’ll use this if you do specify where to start from I’ll use that if you so.

I can call this function okay so I could call this function as find near cheapest with Davis and then I could say our pause equals okay minus 120 and 38 okay if I want to control that alternatively if I really just if I just don’t if I this is always the same same starting point I just say here’s the.

Actual here’s the data I want you to look through and then when I get.

To the 11,000 the 13,000 I can just hand that in as well okay so you know I can now say something like Davis Day two okay and then on this list these will all work is this clear.

All I’ve done and what I’ve got a function I had I say these are the in these are the parameters that you’re allowed specify we get I when I wrote this function gave these names.

And I was going to use in the body and the commands here so I instead of Davis I said data and data so I make sure this code gets changed to actually refer to the variable.

That I said that I declared this parameter name and I gave these default values so I created a function here’s the parameters here’s the body of the function and typically inside curly braces and then I said oh by the way it lets let’s assign this function to some name and that’s the name okay now the unfortunate.

Thing is I never tested this function so source now I just okay and now I have a function in here and that’s what it looks like and let’s just see and that’s what it gives.

Me back that’s the cheapest that’s the cheapest department within 0.01 okay of this guy and then we said no no no no no no no you needed to actually have such that Davis dollar bedroom was greater than or equal to two okay and now hopefully.

We got a different answer so this function is now we’re done here we can reuse this all the time and.

And any what time we want we give it different inputs we get different outputs that’s kind of that’s the point and in my mind it’s really not a big stretch from the hard part was writing.

This code to actually do the calculations putting it around putting a function function parentheses and then specifying what the inputs are I don’t have to I could I know I didn’t need to put this I could have just put down point no one but then if you wanted to.02 you couldn’t control it okay like I.

Could have just locked this down I’m gonna sort.

Of said I’ll pause is just always going to be the.

Same thing but I didn’t but this is more.

Flexible but it didn’t cost us anything to actually just say here’s a good default let’s use that but you can override it if you want okay clear to people he also did something very simple okay we did something very similar to this which is okay we did said I said loop over Davis go through every column in Davis in the variable name Davis and count the number of missing values and there wasn’t a function to do that so he said okay I’m gonna write a my own.

Little function my own special votes so so small I don’t even actually need to create it in a separate file and and and source it in and reuse it later on although maybe I will but I basically said all we want.

To do is say some is dot n a of X okay this says Davis is a list but with however many columns.

For each column pass it in as X and then this code gets to see that column and it says is dot n a of.

X true their false of some sum of truth and false it.

Just gives you the number of troops okay then we.

Got that and this is we just wrote a little function here okay that is a function if I’d wanted.

To I could have said hey count a nays okay except right next to print okay I bet I’ve now got a new function.

That I’ve written this is a.

Perfectly good function just as good as anyone elses function it does.

Exactly what I want and no one thought to write this before okay so that’s fine so I might do this and now I can just say s apply Davis of couch Tenet’s and sure enough I can reuse this function all over the place I can say hey count na s of Davis dollar price such that Davis dollar price is greater than is greater than 10 that’s a thousand okay and I still get none because it was not in.

The first place but the but I can reuse it in different in different ways okay this is how this is how easy it is to write functions and you can put them into a separate file and reuse them over and over again so once you once that function becomes useful to you you just lock it down and it’s over okay so any questions about writing.

Functions they’re hard because the but because the computation they’re doing become hard but I’m hoping that this part is easy here’s the name of the function I declare this to be a function I put parentheses parentheses okay I’d need some inputs so I’ll so I could.

Just stop here and then this was the hard part this is well me actually have to think about for half an hour to actually make the computations reliably correct okay but other than that this ain’t hard to actually write and this one how many na is that’s an even simpler function this is the one we just wrote we put it in a file we can source this in it just takes one argument no extra.

Things you can only specify its input and it consumes a vector and it just goes ahead does its thing and gives you an answer for those of you used to other programming languages where’s the return statement.

There isn’t one you don’t need it where’s the curly braces you don’t need it because there’s only one line in this in this function so you don’t you don’t need it but there these are kind of throw away things that or that.

You can actually reuse them over and over again everyone happy what do you wanna do funny.

Very specific we could talk about that afterwards as opposed to it’s not necessarily that thing that everyone wants needs to do there’s one other thing again feel free to leave the we can show there’s a person who was here yesterday had a problem okay so which is here’s a data set I got I remember where it is.

Okay this one here is a data set here’s a here’s a date/time separated by a comma so this is all one field this is a CSV you can tell that because it’s called CSV okay which is no you can’t tell if the CSB because it ain’t to CSV.