Intel posted a blog article in mid February, about a research project that they call "Reliance Point". Based on the article, it appears that they are working on ways to selectively share data (addressing privacy and IP rights concerns), and provide integrity and isolation for that data. Intel refers to Reliance Point as a "trustworthy execution environment".
The environment is interesting in that it will bring together data from multiple providers, and allow the providers to perform calculations over the complete set. The providers have to agree on the algorithm that will be used to do the calculations and trust that the infrastructure will protect their data (and not allow other uses or algorithms to be executed, or the data to be revealed).
"Letting Data Breathe" is the name of the blog post. That title seems a bit exaggerated to me. For data to "breathe" (i.e., be integrated from multiple providers), there must be some standard set of semantics and structure that is supported by the providers, or there must a way to map between the syntax and (more importantly) the semantics of the different providers. Otherwise, what do calculations mean when run against data with unknown structure, and/or unknown and disparate semantics?
There is no mention of data integration in the article, just trustworthy data availability and negotiated algorithms. But, it seems to me that the project will not work if the problem of semantics is left to the data providers to solve out-of-band. In particular, how does one provider obtain the semantics of another's available data? How is this revealed while still protecting the IP rights of the provider? If proprietary data is shared, then it is likely proprietary all the way down to the layout and syntax of the data (perhaps defined by SQL). But, I have known companies that are reluctant to share even partial db structures since that information may reveal data or IP details.
To make Reliance Point work, something along the lines of OWL and RDF are needed - a way to specify semantics (OWL + SWRL/RIF, which I will discuss in another post) along with a way to handle multiple schemas (RDF). RDF defines a subject-predicate-object structure for data, which is very flexible. All databases can be translated into it. OWL and SWRL/RIF let you define equivalences, logical statements, disjointness and more, which are necessary to actually (semantically) integrate the data.
In theory, Reliance Point seems good, but Intel is working on the easier part of the problem (the infrastructure) and not the deeper problems that will prevent usage (integrating the data).
Andrea
Coronavirus and Machine Learning Conferences
4 years ago
No comments:
Post a Comment