Data Science Initiative Introduction To R Bootcamp (part 2)

By the way because you can tell because it’s got a whole of and garbage at the top so there’s no standard format for a CSV okay there’s claims to.

Have one but that’s fine but this close to being a CSV that’s the trouble it’s so close to being a CSV comma separated.

Values comma this is a value this is a value this is the value this is a value so forth how do I read this into our you can’t go anywhere all the stuff we’ve talked about sub setting and plotting and all that’s just up.

You can’t do anything if you can’t get your data in unless you’re simulating data which is fine but even then you then you need to program stuff how do you read this entire the strategy this is about strategy okay there.

Is a function so I was nice to.

You I gave you an RDS file which was already pre processed and it and it had the dates were actually halves formulated as dates and the factors were formulated this factors the way I want them the title wasn’t a factor so it was easier to deal with the.

Trouble is somebody had to read that data in and then do the processing this is coming this comes to you where you could load this into Excel you can load this into Python you can load it into any other any other application so this is.

A universal format but unfortunate the way we want to but there is a function called read dot CSV in R and it takes a regular CSV file in fact I’ll show ya there’s here’s.

Another version of the same file this one was actually manually edited to actually contain the to take the same data that.

We just saw make it easier to read into our so this is a lot of work but to manually edit this they did actually add in.

Some extra columns which is even harder okay so this is the pain how would you read this in so let’s just deal with this one first but I wouldn’t do this because I I’m not want to do added anything manually because I own that because I’ll make errors because manually doing something means it’s gonna be error-prone and second of all I’m never gonna be able to reproduce it okay and I know it’s I know somebody is going.

To give me keep giving me these files you just somebody says oh I got this file can you eat it and I say say will you ever.

Have another file like this and they say no no.

No and then second I finish doing it manually they go oh I have a thousand other ones you got amazing what you did so I can you can you process these other ones so no we’re gonna do this programmatically so I would start with real data set and actually come.

Up with a strategy for reading it in which is not overly hard and it uses all the same tricks we were using for the last 24 hours okay subsetting finding patterns you know finding the things we want but how what about let’s.

And redux ESB slowed edited CSB and I’m going to turn off these well we’re just too busy to use the default okay there we go we’re done okay so what do I do next class haven’t you said absolutely okay then class because this is the thing I do did it work hopefully read CSV.

Either should throw an error or give me back a data frame that’s fines the next thing I do is dim because it’s a data frame the next is okay there’s twelve of these guys okay names of D there.

We go looks pretty good looks pretty good there’s a comment at the end okay fine what’s what what do I do next my next step I’ve shown it you a couple of times what’s my.

One of two ways is that what we expect this is one way.

Of doing it I asked the more specific question which is more limited and more specific question which I find easier to understand because I don’t see all this extra nonsense but it’s still pretty good even that actually even but this is.

Actually quite useful to both are both very useful what’s funny about this line let’s say party so it’s a quite what the courts the class it’s a factor which means it’s what what does it correspond it’s a categorical it’s categorical variable and here are the first couple.

Of values do they look like categories.

To you something’s gone wrong here okay this is not good news so on.

And so the person who did this you know which is showing it to me because it’s a nice example of okay I just assumed it.

Came in nicely so then they have to multiply velocity by okay bye-bye 1.07 to correct it well you do that but hang on a second you can’t multiply a factor it’s not a number okay so we say de dólar velocity multiplied by 1.

We get hello and we get this warning message which says it’s not meaningful for factors and that should have tipped us off but it would have tipped us off if we did this we read the warning messages read every warning message because it’s nearly always a an error okay but what notice because because I know the punchline for this I found the error before we ever did anything.

I said this didn’t work okay I anticipated that velocity should be a numeric vector.

And it wasn’t and I saw it here because I did some diagnostics before I went further so what happened how would you fix this.

Now you got your sleeping hands on it all very quietly what’s my first round of Ewing hmm rename the class how do you mean Saudi dollar velocity I.

Just basically say what what you want to do I like you’re here.

So I couldn’t do that something like this which is I could set the class you know this will actually this will do something.

Okay okay okay D dollar velocity now let me just let me just take a look at the first some of some of these okay just that what I wanted that’s the first line and velocity is the second elements of a star we went from point 0.38 to 151 okay good idea yeah I need to just try things it did but.

You’ve got to have a backup to check did it actually do something right you can’t break anything cuz all I got to do is okay I’m now actually destroyed D okay so let’s go back that was the great thing about actually having my codes in in a.

Script let’s go back and start again okay now it’s now I’ve recovered to where I was that didn’t work and I’ll tell and if you want to know why I’ll tell you why but it just didn’t work so let’s move on okay but you’re on the right track why don’t we just turn them into numbers okay just setting the class might be a little bit.