by Vas Mylko
We collect programming problems that happen on our way in the special card in Trello, named HACKATHON IDEAS. Sometimes we run hackathons ourselves. The majority of the problems are waiting for enthusiasts to take and solve them. We have not structured them for public publishing yet…
One of such problems was a creation of the DBvoyage from Wikivoyage like DBpedia was created from Wikipedia. DBpedia is a project aiming to extract structured content from the semi-structured information created in the Wikipedia project. Hence DBvoyage could be a project to extract structured content from the semi-structured Wikivoyage project.
Some time ago a student approached us and asked for an interesting task. His name is Andrii Maistruk. We had a talk, what we do, what kind of problems were identified, what was possible to take as an isolated universal task to not interfere with our IP and potentially be useful to others. He liked semantic problem/task. We discussed DBvoyage idea and Andrii took it.
He analyzed DBpedia Extraction Framework, Wikivoyage dump, and extracted main objects, subjects, and predicates. We focused only on what belongs to what. So this is about 10–20% of what could be extracted (other predicates, properties). The end result had to be up and running SPARQL end-point to try semantic queries over Wikivoyage.
Andrii used C++ to massage the data :-O Who uses C++ nowadays, except for maintaining older codebases? Here is DBvoyage on GitHub. Here is README. Here is SPARQL end-point to play with. It is hosted on tiny AWS instance. If there is bigger demand, we can host on bigger node, in our data center. So far, kudos to Andrii Maistruk to make it work \m/
How to play with it? To query all attractions DBvoyage knows in Ukraine — go to the SPARQL end-point, paste this code:
select
?attractions
where
{
<http://ec2-13-59-194-238.us-east-2.compute.amazonaws.com/ontology/article/Ukraine>
<http://ec2-13-59-194-238.us-east-2.compute.amazonaws.com/ontology/property/hasAttraction>
?attractions
}
PS.
There was another student, with the mission to load triples into Dgraph (by the same guy who made graphd, that was a working horse behind Google’s Knowledge Graph since acquisition of Metaweb). Then, compare performance of Dgraph vs. Virtuoso. This mission is sluggish for now…