by Ankita Dhandha
This is a living document. “Lorem ipsum” is placeholder for future content.
Open links on buttons & in footers.
See footer ↓ for page detail.
Table of Contents is upper right – navigate from there ⭜.
Right‐side content previews target content & usually is scrollable.
Presentation works on 💻 & 📱.
{{name}}
This is a note under the subject heading that explains and clarifies what is meant and what is not meant in the definition of the term and in its use as a subject heading.
These falafels are prepared with mashed sweet potatoes, black beans, toasted cumin, and coriander seeds. They are a healthy variation of the classic recipe, as they are oven-baked rather than fried. Serve them with tahini sauce for a tasty appetizer or stuff them in warm pitta bread with salad and tomatoes for a satisfying yet balanced meal.
|
|
{{name}}
Release–0.3.9: SDL 3.9 baseline from Gregg Kellogg
Release–1.0: SDL 3.9 running under AWS/Lambda
Release–2.0: SDL running under AWS/EC2
Release–3.0: SDL with new client‐side features and UX
Release–4.0: SDL with new server‐side support for SHACL/ShEx
Release–5.0: SDL with new server‐side support for ontologies
{{name}}
Implement SDL on AWS/Lambda
Learn Lambda application limits to configure SDL as a Lambda application
Learn SDL/Lambda processing limits to determine graph size and complexity for SDL analysis
Test SDL/Lambda jobs that are too large/complex to run on SDL/Heroku (Gregg’s native implemetation)
Begin to prepare SDL customers for two platforms: one for simple jobs on AWS/Lambda; one for complex jobs (future release)
{{name}}
Define a simple JSON–LD @Graph
Test simple @Graph on Schema.org Markup Validator (SMV)
Use A/B Testing: compare SMV and SDL reports using identical @Graph (SMV/Graph sameAs SDL/Graph) (≡)
On SMV report, click an @Type (PublicationIssue and/or ScholarlyArticle) to see report detail
begin A/B testing
JSON–LD test on AWS/Lambda running SDL
SDL “search results preview” is same information embedded in SMV analysis but served in human-readable format
On SDL report, scroll down to see JSON–LD graph processing and analysis
continuing A/B testing
JSON–LD graph defines Natural Languages used to present content to readers (and intelligent devices) in language of their choice
@Language graph is more complex than previous A/B test
Select SMV report about @Types [e.g. Class (33 items) and/or DefinedTerm (3 items)] for detail
continuing A/B testing
@Language graph on AWS/Lambda
SDL “search results preview” is same information in SMV but served in human‐readable format
Scroll down to see JSON–LD graph processing and analysis
When processed on AWS/Lambda, SDL generates report about 2,204 “triples” defined in @Language graph
continuing A/B testing
This JSON–LD graph is the Ontomatica Knowledge Graph
Knowledge Graph uses 〜 20 @Type objects such as @Corporation, @Product, @Offer and @Dataset
SMV intergrates (links) valid @Type and @Property relationships to create a single view — “Corporation”
continuing A/B testing
@Language graph on AWS/Lambda
SDL “search results preview” is same information in SMV but served in human‐readable format
Scroll down to see JSON–LD graph processing and analysis
When processed on AWS/Lambda, SDL generates report about 2,204 “triples” defined in @Language graph
continuing A/B testing
Force Directed Graph (FDG) of Ontomatica’s Knowledge Graph
Facts (entities & relationships) in FDG are identical to JSON–LD facts in SMV & SDL reports
Rotate/zoom/move FDG to see specific entities & relationships
Link highlighted in red features main entities on pages [mainEntityOfPage]
AWS/Lambda “duration window” limits file size for SDL processing
AWS/Lambda “size window” limits integration of optional SDL features
SDL/AWS/Lambda will process larger & more complex graphs than SDL/Heroku (Gregg’s SDL platform)
SDM server is faster than default AWS/Lambda server & will process graphs files up to 2.5MB
Use client–side methods to add features to SDL reports
Use CSS grid to create cells for specific SDL features
Use CSS lightbox to preview “cell + content”
Upon cell selection, lightbox displays “cell + content” preview in full screen
Build-out “cell + content” design with existing SDL features such as table analysis, error messages and reasoner messages
Build-out “cell + content” design with new features such as graph visualization
{{name}}
Jarno van Driel proposed new SDL features
New features are presented & discussed on Google Docs
Preview document using link below
{{name}}
Jarno–inspired CSS grid with six cells
Feature SDL table (current example injects sample data from Wikidata)
Feature one or more visualized graphs using processors e.g. D3.JS
Feature hierachical view of structured data—similar to Schema Markup Validator
Feature parser statistics
Feature reasoner analysis (snippets)
Feature warnings & errors (here preview shows ~50% of full page content)
Production version of Jarno design
Sample uses simple case from SDL/AWS/Lambda Example 1-22
Cells feature: (1) search results preview (2) RDF (3) TTL (4) RDFa (5) JSON–LD beautified (6) RDF Grapher (7) tabular report (8) parser statistics (9) linter message from reasoner
Footer includes link to SDL Release 2.0 prototype running on AWS/Lambda
On following pages are seven views of a single JSON data source
Example 2-31: Circle Packing
Example 2-32: Sunburst
With server–side assistance, a newly generated JSON data structure could be similarly visualized
Example 2-32: Sunburst
With server–side assistance, a newly generated JSON data structure could be similarly visualized
Sunburst Zoom with LabelsSunburst
With server–side assistance, a newly generated JSON data structure could be similarly visualized
Collapsible Boxes
With server–side assistance, a newly generated JSON data structure could be similarly visualized
Node-Link Tree
With server–side assistance, a newly generated JSON data structure could be similarly visualized
Treemap
With server–side assistance, a newly generated JSON data structure could be similarly visualized
A force directed graph (FDG) visualizes schema.org @Type and @Property specifications & relationships
Source data conforms to subject–predicate–object (?s ?p ?o) format
In contrast, flare.json structure (Examples 2-31 ↔ 2-37) uses hierarchical structure based on RDFS:subClassOf
With server–side assistance, a newly generated JSON data structure could be similarly visualized
Develop consensus in SDL community & among interested parties about final design for Release 2.0 interface
Will need SDL server–side changes to generate JSON structure for D3.JS processing
Will need SDL server–side changes to generate JSON structure for Force Directed Graph processing
Create SDL preparation methods & production platform to analyze large graphs
SDL processor & reasoner objective:
analyze @Graph with 10 millions statements (“triples”)
{{name}}
Refactored USDA National Agricultural Library Thesaurus (NALT) in schema.org
NALT/JSON–LD size: 6.84 MB
NALT/JSON–LD exceeds SMV 2.5 MB limit (no A/B analysis)
Alternative: configured SDL on AWS/EC2 server
SDL/NALT report size: 31.6 MB
SDL/NALT/AWS/EC2 processing time: 5 hours
NALT “triples”: 515,530
Ontomatica’s Web Enabled Directed Graph Engine (WEDGE) Reference Library is an application of National Agriculture Library Thesaurus
Research papers are mapped to schema.org JSON–LD structure in SDL report
Research papers are annotated using schema.org @Type and @Property grammar
WEDGE Reference Library contains information about 200,000+ papers
NALT “triples”: 515,530
Visualization does not include Taxa which is included in SDL report (Example 3-21)
Visualization uses same JSON–LD structure as used in SDL Release 2.0 design and prototype
Refactored US NIH National Cancer Institute Thesaurus (NCIT) in schema.org
NCIT/JSON–LD size: 13.7 MB
(no A/B analysis with Schema Markup Validator)
SDL/NCIT report size: 76.9 MB
SDL/NCIT/AWS/EC2 processing time: 9 hours
NCIT “triples”: 946,520
ChEMATIC (Chemical Entities with Medical Applications, Therapeutic Indications & Consequences) is an application of data from NIH NCIT & NIH Medical Subject Headings (MeSH)
Several other ontologies complement NCIT & MeSH JSON-LD structures
Biochemicals are mapped to hierarchical JSON-LD structures
Total ChEMATIC “triples” (structures and object maps): 700+ million
SDL/AWS/ECS is configured as a Docker container but improved methods will be needed to install SDL on best–available AWS/EC2 server
To reduce processing duration, need methods to use multiple CPU cores
SDL/AWS/EC2 is expensive to run — need to implement a business model to offset operating expenses
Support Shapes Constraint Language (SHACL) — a specification for validating graph–based data against a set of conditions
Support Shape Expressions (ShEx) — an RDF language for identifying predicates and their associated cardinalities and datatypes
Tim Berners‐Lee on SHACL & ShEx:
Shapes explain to machines what data should look like, independently of how that data is displayed to a user
Forms are a user interface allowing people to read and write data in a specific shape
Footprints explain to machines where new data should be stored
Ruben Verborgh on Shapes & Linked Data:
Apps should be coded against shapes [and] Linked Data so other apps can reuse them
[Where] vocabularies provide a list of possible attributes, shapes mandate a specific structure for data, combining attributes from vocabularies in a certain way
Footprints explain to machines where new data should be stored
Key findings in the US PubMed/NCBI article “Automatic Generation of SHACL Shapes from Ontologies”
OWL and SHACL are not equivalent in their interpretation
There are differences in how OWL interprets restrictions (for inferencing) and how SHACL interprets constraints (for validation)
Glucosinolates are natural components of many pungent plants such as brocolli, mustard, cabbage, and horseradish
US NIH NCI review of links between cruciferous vegetable intake & lung cancer risk concluded that high intake may decrease risk in a range of 17 ‐ 23 %
Other studies report similar risk reductions for colorectal, breast, kidney, esophageal, & oropharyngeal (mouth & throat) cancers
American Food Data Systems Institute (AFDSI) & Ontomatica participate in food & agriculture research projects
One WEDGE project integrated & synthesized glucosinolate data from many studies
WEDGE–Glucosinolates enables Principal Investigators & researchers to visualize relationships that otherwise are difficult to understand & analyze
With an objective of creating a Knowledge Graph, glucosinolate data was difficult to synthesize & integrate
Observations & measurement methods were irregular
Plant taxa & genetic variety data was regular, but ‘part of plant’ designations were irregular
Research process would have been easier & more accurate if shape data had been enforced during preparations & observations
{{name}}
Force Directed Graph represents integration of data specifications (from ontologies) & data constraints (to ensure data quality)
“Ontology part” of graph (taxa & ‘part of plant’) is visable in WEDGE–Glucosinolates
“Shape part” of graph (represented as SHACL in TTL format) is in footer
Diabetes is a debilitating & life threatening disease
Research about & remedies for diabetes depend on precise information where “the devil is in the details”
This NCBI article is an overview
ChEMATIC is a WEDGE application to visualize relationships among biochemistry, factor inputs & human conditions
ChEMATIC does not document opinions (something is good or bad); it only documents items & their relationships
Medical & nutrition experts use ChEMATIC information to express opinions & advice
This graph visualizes data about Diabetes Mellitus, Type 2
Diabetes observation & monitoring are key parts of a personalized remedy
First we need to specify the shape of glucose observations
Then we need to integrate observation shapes with monitored glucose data
Example 4-44 illustrates an observation graph for glucose
Example 4-45 integrates
Example 4-46 integrates
Example 4-47 integrates
{{name}}
Visualizing Dexcom Observation Data - Hourly
Visualizing Dexcom Observation Data - Daily
Visualizing Dexcom Observation Data - Histogram
|
|
{{name}}
Develop specification & design for implementing SHACL & ShEx in SDL
Simplify workflow that involves at least 2 source files (ontology & shape) & possibly more than one data structure (JSON-LD & TTL)
Explain at least three conditions: ontology messages, shape messages, & ontology/shape integration messages
Reconcile irregularity between ontology constraints & shape constraints
{{name}}
Support other ontologies — in addition to schema.org
In addition to @Context registration of vocabulary terms, support reasoning about ontology–specific grammar
Enable vocabulary & reasoning for SKOS–based datasets
Enable vocabulary & reasoning for OWL–based datasets
{{name}}
UN FAO AgroVoc is a SKOS–based dataset
AgroVoc is a multilingual controlled vocabulary covering all areas of interest to the Food & Agriculture Organization of the United Nations, including food, nutrition, agriculture, fisheries, forestry & the environment.
US Library of Congress is a SKOS–based dataset
The Library of Congress Subject Headings (LCSH) comprise a thesaurus (controlled vocabulary) of subject headings, maintained by the United States Library of Congress, for use in bibliographic records
Plant Ontology is an OWL–based dataset
“archegonium head” is referenced in WEDGE–Glucosinolates
Avocado Ontology is an OWL–based dataset
Avocado is a popular food & popular ingredient in other foods
{{name}}
US NIH PubChem is a multi–ontology dataset
PubChem is a database of chemical molecules & their activities against biological assays
Author: National Center for Biotechnology Information (NCBI); partOf United States National Institutes of Health (NIH)
More than 80 database vendors contribute to PubChem
{{name}}
Wedge–FNDDS (Food & Nutrient Database for Dietary Studies) is a multi–ontology dataset
FNDDS includes foods & beverages nutrition data reported in “What We Eat in America”
FNDDS is an application of OWL–based ontologies including AFDSI’s Vocal (acronym for the phrase “Vocabularium Alimentarum — Vocabulary of Food”)
Production issues will be more complicated than Release 3.0
May be difficult to load an SDL–instance configured withschema.org–based datasets+SKOS–based datasets+OWL–based datasets
Processing duration could be long (days!)
{{name}}