Data Science Initiative Introduction To R Bootcamp (part 2)

Column names of the data frame I know how I do it okay and I do it I do it programmatically what’s the what’s the better way to do it okay I’m not gonna write code to actually go and bother to do this okay I was like come on this is way too much trouble for this unless these change and I got millions of files and these all have to have to work that way if I happen to do this I could actually say okay but you know do they.

Do the most obvious thing which is just cut me and paste these things open if these are very very big files you may not actually be able to load them into anything to actually.

Be able to cut and paste matter but basically we can let’s just do this again.

Float out CSB so I’m going to do I’m going to do a is equal this grep for bang names because that’s the guy I’m looking for in a it’s number 11 so I’m gonna say a of this guy that’s my guy okay so let’s call this B and there’s another little function here called strings string split or stir split which.

Is on V and I’m gonna say split it by comma look what I get back I get back all the elements I want it’s kind of weird this is a lift with.

One element whose contents is a vector and that’s because stir split will take in a character vector and split.

Each one based on this character comma and they may have different amounts of number of elements so they can’t have to come back this way hey let’s just do this how do I get rid of the first one that the bang names but what I need.

To do is I need to add the date so what do I do there we go we’ve got our names now okay so again programmatically we can.

Manipulate these things I tend to find reading data into our is one of the most complicated things because the data don’t come in simple form somebody else may give it you as a CSV file in which case they’ve done all the hard work but in many cases they don’t for.

Example this is the data that we read that we were playing with from the Davis rental units this is what they come in is okay you tell me where the square.

Foot is okay or where the laundry is or whatever it is now you’ll see why the square foot in the baths didn’t come out quite like at all moments in time okay sometimes they’re not there sometimes they are but this is where we get a lot of data from is scraping the web all the times we get data from we get data from these funny machines which have a totally different file format but they’re close enough.

So we can do some magical repairs if you’re trying to read the.

HTML files don’t come and talk to us because it’s not painless okay there are.

Good and bad ways to do it but they’re there they’re fine but you know what we did if you and if you read through that the notes I wrote.

On this kind of a mini pipe flow you didn’t know about the red lines function but it’s just okay that’s fine but look this is the graph function is also really powerful because it can find these text patterns in in in many strings and identify which ones match and.

Whole other set of functions for grep but actually after this mr. splett but after this it’s just sub setting it’s just playing with subsets on little vectors but they’re no longer date they’re not data values that we cared about they’re actually text that we care about to get the data values so it’s.

Really an awful lot of this stuff is just sub setting and it will get any question you very quiet no nobody’s got any question I’m just going to go to lunch so I’ll be around later on this afternoon a few but then from about 12:30 onwards.

If you want to come back and check come back a Jap we can go through any of the and any questions or exercises that you want otherwise but if you don’t come back then I’m going to go home early there’s an even better but I should be here too like two o’clock unless people come back and then I will then I will stay until 3:00 4:00.

O’clock I have fun you you you you you you you you you you.