Have you heard about the IATI Data Quality Feedback platform yet? This is an online platform that we have developed as Data4Development and is online for some time now! If you want to make sure your organisation publishes IATI data of high quality, this tool can help you with “human-readable” advice on how you can improve your IATI data.
With the support of the Dutch Ministry of Foreign Affairs and the British DFID, we have been able to transform our in-house tool to a publicly available platform to test the quality of your IATI data against requirements of the IATI Standard, as well as specific requirements of both donors.
The road to general availability
Scaling up from an in-house tool to a public service came with new technical challenges: How to reliably process all public IATI data from all IATI publishers in a timely manner, in whatever, shape, version or form. Our initial attempt to deal with all IATI data looked successful at first sight. But it sometimes broke down. Too many times.
We entered the realm of Site Reliability Engineering: “What happens when a software engineer is tasked with what used to be called operations.”
It looks like we have “broken the spell” and the platform is now stable to be used as public service. To give you a peek into some of the work that has been going on, this is what (part of) our monitoring looked like over the last month:
We’re using a cluster of computers (for the technically inclined: a Kubernetes cluster, running on Google Cloud infrastructure) to regularly download all files, validate their content, and offer feedback in various forms.
- Some of the “clean-up jobs” were running fine all the time: these are the bar charts on the right.
- We managed to structurally reduce computer memory use for the validation a bit around February 10th (orange on the top left). We also made successful improvements leading to fewer problems with computers running low on resources, both since the end of January and on February 10th (bottom middle).
- But the main win came last week: on the evening of February 13th around 10PM, when we deployed an improved approach to processing files:
- The low-resource problems were gone completely (bottom middle). They literally stopped overnight.
- The structural memory use went down (top left).
- We tested automatically scaling up the system to catch up with work: more processor capacity was used on Thursday and Friday (bottom left), but then came back to a “normal level”.
- The whole Data4Development team started testing intensively, and we went over all ~940 publishers and their datasets to make sure these were all presented as expected. You can see some spikes in accessing the cluster (top middle) for our testing sessions: a full-on session on Saturday, and more sessions in the week after.
We’re adding more test scenarios all the time.
One of the remaining problems was dealing with huge IATI files (50 to 80MB) that had large amounts of technical errors (50-80 thousand “schema validation” messages). There are only 1 to 5 datasets like this, so we provided feedback for 99.95-99.99% of datasets already. And now we process those last few cases as well.
With another dashboard, we test for known anomalies every hour:
We implement “behaviour-driven” development for our (technical) products and infrastructure. It is a “journey of discovery”: together we explore what adds value to your work, and how we will prioritise, execute, and monitor and improve. It’s not always easy, but we continuously look for ways to improve both our products and our working methods.
We look forward to continue our IATI data quality feedback journey and share our lessons with you!
Want to know more about the IATI Data Quality Feedback platform? Reach out to us!