Queries in redundant systems – Edge Intelligence

Data Redundancy

Consider a system with built in redundancy for high-availability, where there are two servers (A and B) receiving the same data - either through independent feeds or through asynchronous replication between them. It is likely that the servers A and B are not in perfect synchronisation at any particular point in time. For example, server A may be more up-to-date than server B, because server B has temporarily lost the connection with its feed; or server B may have more data because server A has only just come back on line.

So we should assume that at any given time one server has more recent data than the other and/or that one server has more complete data than the other. Both servers will rarely be in perfect synchronisation - especially if data is arriving at reasonable speed.

Query Goal

A query for data may include one or other of these two servers and will choose one based on which one is currently available at the time of the query. But what if both servers are currently available? Which server should be chosen? How do we know we got the most recent data currently available? What if we am most interested in the most complete data instead?

Therefore, in these situations there is another dimension to the query, which is not expressed in conventional query language. This dimension can be regarded as a goal for the query - pick the latest data or the most complete data from the data available (or maybe we simply don't care); then the the query can compare server state with the query goal to pick the most suitable server from those currently available.

This is precisely how queries in Edge Intelligence work. In addition, to the normal SQL syntax, the user can express a query goal with an optional USE GOAL clause, which indicates the priority to use when choosing between available servers. For example,

SELECT max(datetime)
FROM messages
USE GOAL recent;

As you would expect, this directs the query to choose a server with the most recent data, when there is a choice of servers available.

In fact, there are 5 different goals that a user can specify during a query:

Recent - pick the server with the most recent data
Complete - pick the server with the most complete data
Response - give the fastest response time
Available - just use any server that happens to be available
Balance - pick the server which is least busy

In an Edge Intelligence network, server redundancy at the edge of the network is standard practice and therefore it is useful to express a goal for choosing the most appropriate servers from those available at query time.

Comments