The winds of fate hurled yon TPKoC (The Pirate King of Cars) to the shores of Barnes and Noble. A favorite place to go from ages in the past and hopefully on to the future.
Of course, they may not have books there in the future, but I’m sure they will continue to have things of interest.
In any case, there I was on this beautiful place that should there have been fewer than 4 kids in my life I might spend hours there.
Time is of the essence and I found something interesting… A magazine “Used Car Prices” From VMR Vehicle Market Research.
Methinks that from data just lying there this could become information. I quickly seized the magazine (and paid for it) and left.
Oh, the analysis I could do and release information to the world. I decide quickly to type the contents of the book in to an Excel spreadsheet in preparation to load it in to SQL Server.
But… The data might be fine for someone trying to find out what a car is worth (prices seem 2K low, even for this); however, it is not fine for doing data analysis.
Vehicle descriptions differ from year to year for the same car. Abbreviations are used, or not used willy nilly (look at Audi – Premium Plus or Premium+ or Prem+.
There are header lines for each model, but they include information that can only be on the detail level. For example:
2009 Aud TT FW/2.0T-IfT (200hp) – this is all header information.
But when you look at the details, not all Audi TT coupes/convertibles from the year 2009 have the same engine (2.0T, 3.2 V6). Not all Audi TT coupes/convertibles have the same drive train (FWD, AWD). So, clearly, this information cannot be header information if it isn’t the same for all the detail/children records!
It is exhausting going through this data, but in the end it might be worth it. Will the analysis of the data be perfect? No, master data screwed means that analysis can never be more than ok…