Sunday, 11 March 2012

Why Open Data Sucks Right Now

Ive recently been trying to use open data. Now for those of you that are not familiar with the concept of Open data, take a look at the Wiki article. Im not being facetious, it's just that Wikipedia sums it up pretty well.

Now, the idea is that all this data (e.g. bus timetables and whatnot), being available to the public is kinda cool right? Yes.

HOWEVER1 as a developer, trying to consume Open data, at the moment, is a complete pain in the backside.

Open Linked Data is where its at in my opinion. I say this for several reasons:

Firstly: Data is almost entirely useless to the public if its in an obtuse format! A good example of this is the Governments recent push for Open data. 
It's all well and good that this data is out there, but for the love of all things good, please, please, please put it in a machine readable format!! How much more effort is it to click .csv when you save in Excel?

Secondly: Data is almost entirely useless to the public if its in an obtuse format! Now you may be noticing a pattern here however, this time I'm talking about the construction of the data and not the file type.
This is also where the Linked part comes in. 

Ok, so hypothetically, if I were to release all the data about the meals I ate, and you were (for some crazy reason) wanting to make an application (lets call it Meal-o-riffic for the moment) that used this data.
To make 'Meal-o-riffic' you would do something like the following:
  1. Look at data to analyse its format.
  2. Write application to use said format.
  3. Stand back and smile.
So all is dandy right? You have an application that tells you what meals im eating! Perfect!
Suddenly you find out that Alina2 is also publishing her meals and your app needs to expand, so you do the following:
  1. Look at Alina's meal data.
  2. Cry.
"Why?" you ask. Well, those of us who haven't looked at Open data ask the question, the rest of us start getting flashbacks.
The reason you would break down in tears in this fictional example is that the data that you have just come across is syntactically "completely bloomin' different" to the set you designed your app for!

So two sets of 'Open' data. One headache.

The technological astute of you out there will be bellowing the phrase 'Ontology' right now and now the non-techies will be wondering what the hell an 'Ontology' is3.
Now, yes my binary brethren, an ontology may help us here. To give structure to our data would be amazing and when used ontology's are the bee's knees. But this brings me to my second point:
Standards should be bloody standard!
Fair enough, you have your data open. Well done, it's semantically linked using an ontology.
You got all the way to the last hurdle but then you shoot yourself in the head by not using an already existing standard that does what you need it to do.

Linked data is about making the web of things connected and you just made your data an island. Congratulations. And the most common reason I see for this? Not invented here syndrome. So our data got so much nicer and now we've been plunged back into the 'Meal-o-riffic' example, except that varying ontologies are the issue; not poor data.

So what actually started this rant is what I've recently been trying to do, that is: take Open linked data from universities (and other establishments) and extract information to be loaded into an Android app.
I won't go into specifics because my team leader will murder me.  However, lets say I am taking data that is made much, much nicer by semantic linking.

So what will make Open data beautiful?
Now after all my ranting about formats, standards and the rest; all this comes down to one simple soultion.
What will make open data better? What will make it usable and useful? What will push people to care about the open data they produce?
Simply that. If we start using the data, we can email, write, text and punch people until their data is in a standard, useful and usable format. How do I know if my data is correct until someone tries to put pins on a map for every meal I've eaten? I simply don't. And this is the rock/hard place that open data lies in at the moment:

It's all so moon-hoveringly bad because no-one uses it.
No-one uses it because what is out there is moon-hoveringly bad4.

1: I write that in caps because it's such a big however, if you were in the room with me I would have to scream it at you to get across how big of a however it really is.
2: Randomly selected name: Honest.
3: Put simply an Ontology is a way of providing a structure and some meaning to your data.
4: Im pushing for 'moon-hoveringly bad' to be a new industry standard for when something is sub standard