Posted: 09/05/2017
A very valuable post from Robert NowakThe UK MOD recently asked for feedback on whether anyone within NATO had experience with Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS).At the NATO Joint Warfare Centre (JWC) we experimented in 2013 with the development of a client-server system (slide 1) using the English Wikipedia dataset that fused RDF triples and textual data, as well as hosting and displaying exercise geo data based on the Defence Geospatial Information Working Group (DGIWG) Feature Data Dictionary (FDD) and the concept of GeoSPARQL. Based on this experience the following response (edited for posting) was provided.o An RDF implementation is fundamentally based on terminology and taxonomies/thesauri, therefore what taxonomies or thesauri have you been using?There are no existing ontologies / schemas that serve this purpose. This is a problem when you see the number of unique ways that different systems have chosen to evolve, some using data triples / RDF, others not (slides 2 6) . My strategy was to use the most accepted ontologies to pick and choose what was needed. An example extract can be seen in slide 7(this extract does not include some custom terms used for the Wikipedia part). You will notice that DGIWG registers and concepts all had to be included, for which there is no existing namespace or convention. This is the core standardisation and interoperability problem that needs to be solved.o Are you using aspecialist (not object orientated or relational) database structure or data processing technique(s) to implement RDF? If so, what database structure or processing technique(s) are you using?I tested Jena / Fuseki and Parliament before settling on Virtuoso as the only practical solution that could easily handle the millions of triples required. The ideal situation would have the data all stored as RDF triples as in slide 8, which was certainly within reach for the project but unachievable due to time constraints. The actual demonstrator (slide 9) stored Wikipedia textual data in a custom-made server and relied on DBpedia data stored as RDF triples for the attributed information. Solr provided the search capability and I would expect a separate search server would continue to exist in any production system due to the importance of providing a very fast search capability (which large RDF datasets and SPARQL will not easily give you).Note. Since the work was done in 2013 there may have been software advances which make my previous experience more or less obsolete.o Is your project leveraging auto population of metadata, obviously, from textual information like reports? If so what process and/or software are you using? As suggested above the easy approach was to use DBpedia as a readymade solution to this problem, which has its pros and cons. Beyond that there are of course many different views about what the most desirable system would actually accomplish and how to do it, depending on the nature of the information being stored and to what extent it has a spatial component. Some would argue for the completelyencoded kind of data like the example of the Terrorism Knowledge Base (slide 10) for textual data, and OpenStreetMap attributed features (slide 11). This is highly desirable if you want to really use semantic queries fully and I would argue that even if not, the elegance of storing all data in one system will streamline and simplify the client / server interaction considerably. Alternatively you can have your RDF data somehow mixed with the text, like the example in slide 12, or kept separate like our demonstrator shown in slide 13.For me this is the big open question. What is the best solution in a given context regarding which data to encode, to what extent and how it should be presented to the user to interact with. Should the text and the attributed data be linked, merged or kept separate and how? This is the big question to be addressed by experimentation.My aspiration for the demonstrator would have been to test the effectiveness of some different strategies in the context of scenario development and also exercise play.o What pitfalls/challenges have you come up against and how did you overcome them?The biggest problem in my mind is to decide what information should such a system contain and which community of interest is it intended for. Then, as mentioned above, the main issue in designing the schema is what to encode and what not to encode. Or to put it another way, if the NATO Geospatial Information Model (NGIM) was implemented in RDF and a geospatial dataset was stored as triples would it be beneficial? If not then which model and what data should it be? Unfortunately we have not had the opportunity to try this out with actual scenario developers or exercise participants so far.R.W. NowakGeospatial ContractorJoint Exercise DivisionNATO Joint Warfare CentreStavanger, Norway