Rhize Up Podcast
Episode 12 Transcript
[David]
All right, let’s fire this thing off. Good morning, good afternoon, good evening, and welcome to another episode of the Rhize Up podcast. And today we’re going to be talking about transactional data.
There are a lot of attempts that people are handling. It seems like we have a really good grasp on what it is that we do with time series data or SCADA data or telemetry data. But once we start getting into exchanging transactional data, that’s where we start running into some challenges.
So we’re going to talk a little bit about how do we work with transactional data today. And we’re going to welcome back into the hot seat, Jeroen. He’s been on previous episodes, I encourage you to check those out.
But if this is your first time listening to the podcast and you’re not familiar, Jeroen, could you take a minute to make a quick introduction?
[Jeroen]
Sure. My name is Jeroen Janssen, and I’m an ISA-95 consultant with Rhize for about two years now. And I’m happy to be here.
Describing Transactions and Transactional Data
[David]
All right. Thanks. Thanks again for agreeing to participate in it.
So let’s start again. We’re talking about transactional data. And on a previous episode, we talked about the two types of databases in a generic sense.
There’s your time series data, there’s your event data, and it seems like transactions are more around the latter. That’s where we have an event and something needs to occur. So maybe just for a level set, and of course, maybe we can never define it, but we can at least describe it.
What are transactions, maybe events are appropriate in here, and then what’s transactional data? Could you talk a little bit about that?
[Jeroen]
Sure. So for transactional data, think about when a certain part of the equipment or machine started, and you need to record that, and you need to make sure that that transaction or that data is really captured. So if you start the equipment or start a job, you want to capture that information.
And also when you want to complete it, you want to capture that again. Also think about material consumptions or production. It’s very important to have those.
So those need to be captured and need to be transactional. You don’t want to receive it twice. So you have two consumptions or double consumptions or production for this.
[David]
So the event is something that’s of interest, and then there’s now going to be a transaction between that system that, say, captured that event, could be your level two, your level three system, and now we need to send a payload there. So that’s going to be the so-called transactional data, I think is really what we’re getting at. So, all right, perfect.
So I think from just a level set, that’s really what we’re understanding for events, transactions and transactional data. That’s what we mean when we say that. So now let’s get into, we understand what these are.
Now how do we want to start implementing the capturing of these events, the development of the transactions and the transactional data itself? So let’s walk through a process of what’s going to be the first step in how we go about approaching developing transactional data, and how do we want to work with that? So maybe let’s talk about the when, maybe when you start capturing that.
First steps in approaching Transactional Data - the When?
[Jeroen]
Right. So I would say, depending on the use case, we want to see what we need to capture to get to that certain use case. If we look at ISA 95, the standard, it’s got a lot of objects, a lot of models, but we don’t need all of them, only model what we need.
So I think that’s where we start. We look at what kind of data do we need to exchange, where do we get it from? And then a quick start is get a sequence diagram in place.
So a sequence diagram basically describes each of the applications and what the data flow would look like. So where’s it coming from? When is the event triggered?
What information is then exchanged? What kind of information is exchanged to another system? How is that handled in the other system?
Is that stored? Does that trigger something else, maybe a workflow? And then optionally, you’ve got some feedback to the original, to the sender with an acknowledge message or an error message that is handled correctly.
[David]
And in these events, that can either be done or that could get triggered from an operator, say, clicking a button on a control screen, or it could be that there’s just a message queue or an
enterprise service bus that’s just listening for something, and that’s going to fire that off. But that’s the event that is, that’s the when that we want to start documenting and mapping how that goes.
[Jeroen]
Yeah. And triggering could basically be anything. It could be an operator clicking a button, like you said, but also maybe something happening in the equipment that is being sent up to a layer three system.
Also think about the barcode, which is scanned, or maybe a truck arriving at the dock to unload the raw materials, could be anything. Okay.
[David]
So there’s all types of events that, you know, they occur. There’s a when that all gets there. So when you’re starting to work through these sequence diagrams, I mean, do you just monitor like, well, this is what’s going to happen, and then here we go, we’re all done.
Or there are certain things we need to be cautious of. It’s sort of, there’s the happy path and the unhappy path.
[Jeroen]
Well, I think the first thing you need to think about is this transactional data. Is it an issue if it happens twice? If we receive it twice, can we see that we received the same message twice?
So does it have a unique ID that we can say, ah, we already had this message, we processed it, we can leave it out? Or maybe the data gives the same information again, so we can just override it as there’s no issue. And then we look at the happy path.
So we process the data, let’s say we can, we see it’s a new message, we process the data, and we can look if the message that we receive, does that require an acknowledgment message? There’s some messages that we really need to receive, we can build a mechanism around that and send an acknowledged message with, we processed it, or maybe an error message, we can process it because some data is missing or there’s some issue with the data, or we didn’t receive it, and then you should get an error message on the other end. So that’s basically, you get a message, you send an acknowledged message, and there’s some handling, some error handling that needs to be done around that.
[David]
Excellent. Yeah, so when you’re talking about can we send that same message, that concept comes from mathematics, it’s out of potence, when if I send the same thing through whatever this function is in mathematics, I’m going to get the same result, well, the same thing is true here. And I think it sounds like that’s part of the consideration here is that when I trigger this
event, I may accidentally send that payload twice, we’re going to be able to handle that, or there’s something wrong with the message that was there.
So we need to have some error handling, and then it sounds like maybe we need an acknowledgement, something needs to come back and say, yep, we’re good, or maybe we’re not good. So sounds like that’s what the sequence diagram is going to capture for us.
[Jeroen]
Correct. And it’s always important to see what the other equipment is sending, and how we can make it as simple as possible. Think about if we have consumption counter, so for production or consumption, typically what we do is we use a counter that continuously counts up, and then if it reaches the maximum value of that integer, it drops down to zero.
So if we do that, we know the starting point when we receive the message. So let’s say that every minute you get a new counter update, and that counter slowly increases over time, when you’re producing or consuming. But if you, so at the end of your job, or you want to calculate how much was consumed over a certain time, you can take the delta of that counter.
And if that counter goes to the maximum value and back to zero, you know that, hey, my counter value is lower than the previous one, so you can automatically calculate to the maximum value. And then from zero to that value that you received, still calculate the delta. The benefit of doing that is if you miss a message, then the next one you’re going to get, it’s still the delta.
So you’re still going to get that information of that missing message. So this is an example just to how you can handle missing message if that’s possible and make it more robust and as simple as possible.
[David]
Yeah. Yeah. It always seems like the best laid plans, and there’s always going to be a lot of gotchas that go into it.
So I think certainly this is where you’re going to start capturing the sequence diagrams. There’s the happy path. There’s all the things that we conceivably can go wrong within that.
And we get it there. So now that we understand the when, we’ve created some sequence diagrams. We know what all these potential flows could look like and the amount of data exchange.
What are some of the technologies we can use for the actual exchange of that? It seems like maybe I can do some things synchronously. Maybe I want to do things asynchronously.
So as we start talking about that, maybe let’s just define a little bit or at least put some understanding around what do we mean by a synchronous message? What do we mean by an asynchronous message?
Synchronous vs. Asynchronous Messaging
[Jeroen]
Yeah. If the message is in sync, the sender sends a request or sends out the message and it’s going to wait there. It’s going to sit and wait before it gets the response.
The benefit of doing that is that there is a threat or part in the application which is waiting. So there is actually, you get either a message back and acknowledge or you get a timeout because that message never arrives. So you know, you’ve got like a worker threat waiting there, sitting down, waiting there for that message to check if it went correct.
So if we look at async communication, it shoots a message and then that worker threat, simply the application continues. And then at a certain point of time, the sender receives an acknowledge message, but it’s not by any means linked to that original message. So all that linking and waiting, that needs to be built manually.
There’s a different structure there. So that means that you should have like a sort of watchdog that is looking, hey, what are the messages that I send out? Which message is still waiting for a response?
Or maybe something has timed out. And if you receive some acknowledge message, it needs to take that off the queue, like, okay, this is done. But all that handling that needs to build in the application and it’s not built in the communication channel itself.
[David]
So when would I want to use synchronous versus asynchronous? I mean, it seems like it’s two different technologies here. I can make a SQL query, I can make a REST call.
Those would be examples of synchronous. Or maybe I’m just going to, there’s a messaging channel out there, some sort of message queue. When would I want to use one versus the other?
What are some of the considerations there?
[Jeroen]
That is a very, very good question. So if your application cannot continue. So first of all, if we put RISE in place or a manufacturing data hub, or maybe an MES system, typically a lot of communication channels are already in place.
I think the customer already has an ERP system. They already have a SCADA system and some systems they don’t tell you what kind of channels that they can work with. They just have this one channel and that’s what you need to work with.
So that’s the first thing you need to look at. Another thing is that if you have a synchronous call and your application cannot continue because it needs to wait for that information to get back, then there’s no issue in using that. But if you think about more like, I want to completely decouple that, or you want to send a message and you want to have multiple applications looking at that same data, then you can look at getting an async communication in place.
And I’m quite sure that there are way more use cases where async would work better than sync.
How is ISA-95 used in these Models? Are there other Standards? - the What?
[David]
Excellent. Yeah, there’s a lot that goes into it for sure. Generally, the rule of thumb I follow is that if you’re going to make a query and you need to have an immediate response that’s synchronous, where if you just want to fire off a message or you want to have a many-to-many or one-to-many and you’re not wanting to wait around, then asynchronous is good for that as well.
So certainly not the scope of what it is we’re talking about, but I think a great consideration. So one of the concepts that we brought up, and I believe it was on a previous podcast, is that when you start going around and modeling data or you approach modeling data, certainly we’ve talked a lot about semantic data models and the like, if you don’t already have a standard for doing that, see if a standard exists. And during the introduction, you mentioned that you are an ISA 95 consultant.
I think of you as the ISA 95 guru. Is ISA 95 something we want to consider? Let me throw you a quick softball here.
So how would you, I think maybe I’m going to answer the question, how would you use ISA 95 for doing these message models, the what of what it is we’re trying to do?
[Jeroen]
Yeah. So if you look at ISA 95 part five and six, they kind of describe how you could communicate two separate applications that are not ISA 95 per se, and how you can take non ISA 95 information to exchange that into, or put that into an ISA 95 structure. That is part five that describes that.
And then part six describes how you can send that over some kind of information channel. And on the other end, where you have your other application, the receiver, how you can translate that ISA 95 information in non ISA 95 information. That is typically what part five and six describes.
And then if you use ISA 95 as a whole, so part two, part four, you already have those objects. So basically what part five says is, okay, I’ve got a lot of objects from part two, part four. They describe what the structure of that data is.
And part five describes, okay, how do I put that into a message, which I can actually exchange with another application?
[David]
Yeah. As I’ve read part five, the standard, I call that the verb section where we’re going to do a push. What are the verbs that are associated with that?
I am going to do a pull. What are the verbs associated with that? Or I’m going to do a publish.
And of course, there’s some very specific verbs that get used with that. And then when I get into the part six, that’s, I call that the nouns. I know there’s a little bit more that’s there, but, you know, so here’s like the predicate.
And then we’re going to have the object of the predicate. You know, it’s the verb noun and how everything’s moved back and forth. And those nouns, of course, are going to be the information objects that are outlined by two and four.
That’s how I’ve digested parts five and six, you know, maybe in layman’s terms. Of course, there’s more to it than that. But there’s push, there’s pull, there’s publish.
And then, you know, what’s the information that’s going to be passed? How do we want to model that? That’s the model.
So is there a standard? I mean, ISA 95 is one of them. Are there other standards that people should consider for doing these types of transactional data and moving it back and forth?
[Jeroen]
Yeah, there are a few of them. The one that I use a lot is b2mml. It’s basically an implementation of those two standards.
And b2mml originally was an XML. It describes an XML message. But we’re starting from, I think, version seven.
They introduced JSON as well. So it includes JSON schema. And you can use that for validation, schema validation when you receive a message.
And that gives you a complete structure. Now, typically, what I do is what we talked before. ISA 95 has a lot of objects, a lot of attributes.
And I only implement the ones that I really need. Typically, what I do is we’ve got a message that is quite big. Because you can have all these variables that you can send over.
What I do is I narrow down that message. So I keep the structure, the b2mml structure. But I take away or comment out the ones that we don’t need.
And that way, that message becomes way smaller and easier to understand for the customer. And we use that structure, that schema. And we still might use the schema validation on the complete message.
But if we describe, we only describe the ones that we need. And we’re going to exchange and pick up the values that we need and receive.
[David]
So if you’re commenting those out, and just a thought that I had, it means you only use what’s needed. But it seems like maybe in the future, there’s going to be a use case that says, yeah, I’m going to need that information. And that almost lends itself very well of, well, I’ve already built in the model.
I’ve already done the schema validation. So now if I need to have new, there’s a new use case, all I need to do is add it back in and makes the extension kind of easy. Does that make sense?
[Jeroen]
Yes, completely correct. Like the fields you didn’t use, you start filling those in. And at the other end, you start reading those out and processing that data.
But you know that structure is already there. The schema validation is already there. So yeah, it’s very easy to extend.
And you don’t need to rewrite anything. It’s already there. We just don’t use that message or that tag or that placeholder at that moment.
Tools and Resources in handling these Interfaces
[David]
Excellent. Cool. Yeah, I just thought that just popped into my head of makes it very easy.
Start with what you need, start small, but extend it. Now you already have a mechanism for doing that exchange. So we talked a little bit about the when.
We’ve built a sequence diagram of when these things occur, what’s going to be the information, or how do we want to get that information around. Then we talk about the what. So this is that B2MML or the ISA95 part 5 and 6 using parts 2 and 4 of the models.
But we’re big on documentation. So when I now get ready to document this so we can start building these systems and doing all that development, what are some of the tools? What are some of the resources that are available that people can utilize for handling these different types of interfaces that are going to be built?
[Jeroen]
Yeah, for synchronous communication, I use the open API. And for the async communication, I
use the async API. And what you can do that basically you describe your communication.
You can do that or the structure. It not only describes the structure of the document, but also the channels you’re going to use. So that’s basically part 5 and 6 together in one document.
And you describe that either in YAML, which is very simple to type out. And I think with the latest version, you can also implement your JSON schema. And then it generates that same layout.
And what you can do with both of those tools is you can, instead of having all additionally to having the structure, you can also add examples in there. You can add a description. You can tell, OK, this is used.
This is mandatory. Maybe not for the schema, but it is mandatory for our implementation. So you can add an additional layer of information, of documentation on that.
And out of that documentation or the open API or async API, you have different templates, I would call them, to generate either your API itself. So generate code out of that. You can also generate a static website, which you can use for documentation.
So the customer could look at that. And also your third party, which is implementing the other side of your API call. You can both look at that same schema and update that in real time.
And that works very good.
[David]
Oh, excellent. So we know the when. We built in the sequence diagrams.
We know the what. And now we have a way to document how we’re going to go about implementing all these things so that the developers can then build against exactly what’s going to happen. You talk a little bit about schema validation.
I think we’re going to revisit that topic here shortly. But there’s a great way that now anybody who wants to consume information or in exchange this information, a lot of that’s all documented and ready to go.
[Jeroen] Yep.
Considerations when working with DataOps tools and brokers - Schema Validation
[David]
Perfect. So one of the architectures that I’ve run across for doing transactional data is utilizing some kind of DataOps tool. That’s where we’re modeling data that has multiple sources of that information.
It’s creating these semantic data models. And then we’re going to exchange that information through an MQTT broker. And I just want to focus on just purely the broker itself.
There’s a lot of them out there that have all types of additional capability within it. But for the sake of this discussion, let’s just talk about we’re going to use MQTT for the exchange. I don’t think this is a bad approach, but I think there’s some considerations that need to be made when we’re going to do something like that.
So could you talk a little bit about just some of the, hey, make sure that you consider this, make sure you consider that if we’re going to do something like that. Because again, we have to work with the tools that we can support, but there may be some things that might get overlooked. And I think it’s important that we understand that within the context of transactional data.
Could you speak a little to that?
[Jeroen]
So if we look at DataOps or an application that takes information from one application, then transforms that and put it into MQTT or some kind of broker. And maybe that same application or a different DataOps application can pull out that data and put it to the other. So it’s kind of an application that exchange data.
I think if we look at transactional data, it’s very important that that end-to-end validation with your knowledge that that stays in place. So one of the things that is implemented in part five, but also in the B2MML implementation of that, is that, for example, you can, in the header there, you can put in, hey, I’m the sender of this information and I want to get a confirmation from this and this application. So it can already give you the information where the sender expects which application is listening to this and needs a confirmation from that.
Then the DataOps application needs to take that into consideration as well. So everybody can distribute that information to every system, but it needs to think about, okay, but this application, from there, I want to make sure that that data is processed and that there is an acknowledged message going back. So if you put a DataOps in between, the DataOps should also help getting that acknowledged message across from one application to the other one, or maybe one to many, where that is needed.
And sometimes that’s always needed. For some messages, it’s never needed. But where there is a gray area, it needs to be defined.
And your DataOps application needs to make sure that that is respected. [David]
One of the concepts you brought up earlier was this schema validation. So what exactly is that and why is that important here?
[Jeroen]
Yeah, so typically in MQTT, you define a lot of topics. Those topics can be defined following part six. That describes what that topic structure should look like.
But still, a message posted on that topic can have every format. You can have a very flat message with just some tags and a value. But you can have a very nested object.
And how does the sender and the receiver know how to read that message and know where to find a certain value within that message structure? And that is where part five and also B2ML comes in, as they describe where that certain value is located in your message structure. And so the sender and the receiver, they talk basically the same language.
So if I talk about schema validation, if a message comes in, you can have a schema definition, either XML or JSON schema, that says, OK, the message that I receive should have this structure. And these values should have been filled in. Or maybe it’s an enum value that they already set, like this value can only have these options.
So all that validation, you can do that within XML or within JSON. And that should be done by either the sender or the receiver. The sender as well, I would say.
But if the sender needs to check your validation schema, the implementation went wrong. But it’s still a good check to do. But your data ops can also check that information.
Hey, is it valid against the schema that we talked about?
[David]
And so if there is a problem, clearly, we need to have some kind of error handling. So we already documented earlier, these are all the things that could go wrong here. So we probably, yes, I need an acknowledgment that I’m good.
Sounds like we also need an acknowledgment that things aren’t so good. [Jeroen]
Yeah, I think there are different layers of acknowledgment. The first one should be that I receive data, but I cannot read the data. That’s the first level.
You don’t need to understand or read an actual value or process of values. Get a message in, and you don’t understand what’s written there. So it doesn’t match in the validation schema.
So you can directly send that back. And if I say send back, I can acknowledge that you say, with an error message, invalid schema. And then the second layer of validation is not the schema, but it’s the data validation.
So do I have all the data points which I need to process this message? And is that data within
the range that I expect? Things like that.
So I think there are multiple layers of validation.
[David]
All right. And either way, when something is bad, we need to have the unhappy path very clearly defined as well.
[Jeroen]
Yes. But again, with async communication, you always need to build that yourself. So that needs to be part of that contract that you say, this is the way that I do error handling.
It’s not like if you have a REST call, for example, you get an error message back. Those are all defined. So you’ve got that additional layer.
That additional layer is missing in async. [David]
All right. Excellent. So coming back to the very beginning, transactional data, an event, something that’s of interest has occurred.
We’re now going to get some data exchange between there. That’s going to be this transactional data. We certainly want to start with is at least the approach of doing or working with transactional data is we want to have a sequence diagram of when are all these things going to occur?
There’s their happy path, the unhappy path. It’s what’s the information now that we’re going to exchange? And a good approach for doing that is, as I said earlier, if you don’t have a standard for doing this, find and see if a standard exists.
Of course, we’re big fans of the ISA-95 standard. That’d be the piece. And then, of course, then finally document the systems that you’re going to build so that other people can either develop against that.
Or now, if I’m a consumer of that system and I want to get information out, I know what’s going to be available to me. Whether I want to do that synchronously or asynchronously becomes a documentation. And then finally, finishing it up.
Certainly, there is a common approach using DataOps and a broker. Yep, you can absolutely do that. Here’s some of the things you want to consider as you’re building that in because it’s not necessarily inherent to that.
So I think that’s a really good summary of what we have just discussed. Jeroen, is there anything else you’d like to finish off or just one final thought that you have that would be a
good takeaway for working with transactional data?
[Jeroen]
No, just last thing to add. I think you had a perfect summary. So thank you.
Thank you for that. I think it’s always important to look at the type of data and look at your process and then think about what is the best method of handling this data. There’s no right or wrong.
It’s just that one structure fits better for that application for this data exchange than the other.
[David]
Excellent. I think that’s a very good summary there. There’s no silver bullet.
There’s no one right way to get at it. So just understand here’s transactional data and there’s a lot of things to consider when it is you’re doing that. So thank you again for joining us on this episode of the Rhize Up podcast and look forward to seeing you on future episodes.
Enjoy the rest of your day. Thank you for having me.
[Jeroen]
Bye.