by Vas Mylko
Last week was terrible. Black terrible. Recently we experienced an avalanche of very different issues. As a result, we didn’t notice that the search at curiosio.com was not working well for all countries except Brazil. Only when one persistent user tried a dozen of searches I noticed that 80% of them failed, in the context that had to work rock solid.
Networking. Somewhere during the recent issue at AWS with internal load balancing, we experienced another unknown issue with incoming TCP traffic. There were unpredictable interruptions of the traffic via the channel that we “optimized” for performance a long time ago. We fixed it by removing the customization and got even better bandwidth. Antifragility.
Data. One of our data sources that we are running locally — DBpedia — crashed together with the OS or it was vice versa — OS crashed together with DBpedia. It did not start up well after that. DBpedia instance is very fragile, you know… Try to set it up, with Virtuoso and so forth. We updated the entire instance to the latest version. Some adapters and connectors stopped working, hence pulling the right or vice versa — pulling the data the right way.
UX. As a result of corrupted data, there were bugs on GUI. They were deep enough to be visible for everybody. But they were pretty bad and obvious. Fixed. Added automated tester to validate the data snapshots along the pipeline. Antifragility.
Routing. One of the nodes that our Routing Engine was running on started to behave badly, reporting hardware errors. We use virtualization, so that was not a big problem. But we went out of sync with configuration when switched to the other nodes. The routing didn’t work right for a week. We are going to implement the checklist and get closer to the ITIL with DevOps. Antifragility.
Public Releases. We wanted to implement five countries at once: Brazil, Argentina, Peru, Chile, Bolivia. The pipeline was working well. Candidates for Signature Trips were found with some difficulties. It’s not France or the United States. But the span of geographical coverage helps to improve the overall quality, hence it’s not a choice, it’s a path. So far only Brazil has been released…
As of today, the site is up and running. More fixes are on the way. Stay tuned, be lucky, follow your curiosity.