Processing Files with Integration Host
Transcript
Welcome to this tutorial where I hope to go over some of the Best Practices for Processing Files with Integration Host.
To start with, I'm going to change the receiving activity type to be a Directory Scanner. It's going to be prompting me for a name. I'm not going to use a name, and there's a good reason for that. If I type in the directory, you'll notice that the name is automatically picked up and created using the directory that I've typed in. This can be helpful when I'm viewing the workflow. I can automatically see what this activity is doing, where it's getting the files. However, for some people, it might be more practical to give it a proper name.
The next is the File Filter. This is the type of file that we're going to pick up. I'm going to have this pick up from the c:\temp directory, every HL7 file that soups. But if I was using XML or something, it's just a matter of changing that filename to XML. It does support multiple file types, but that's mostly not practical for our purposes.
I'm just going to change that back to HL7, and we'll pick up all HL7 files from the temp directory. We've got the choice to keep waiting for more files to added to that directory, and processing them as they come in, or to stop running once all the files are processed. That will allow you to build up files and then start the workflow again at whatever time suits you. But I generally prefer just to keep processing in real-time, so that's what I'll do. I'll leave it on keep waiting for more files to be added.
I'll set the message type too. Again, I'm bringing in an HL7 message so the message type would be HL7, but you would change that appropriately. And I'm going to put in a sample message for the HL7. This is just to give the system something that I can use for bindings and stuff. It's a sample file that provides me with the structure of the message. It's not the message being processed, but it does make it much simpler to create my workflow.
Post Processing section:
Do we want to Delete this file after we're finished processing? You will probably do that if you have another system that's sending in the files here, but it's also storing a replication of that file. It has logs that you can go back to and resend from the other system. Also, it could just be a low priority file. It doesn't really matter if this gets through or not. But if it is a high priority file and you can't replicate it from the other system then obviously, you're not going to want to delete it.
You'll probably instead want to Move it to another directory after processing. So, I can click that, and I can put in a path for it to go to, call it c:\tempback.
Then after its finished processing, it's going to put that file into this directory. So that means you can, of course, group them in a particular directory.
I'm going to take the patient's id number and drag that in to create a directory. It will create a directory that uses part of the data from the message in the directory name. This can help to sort backed up data for you.
Then we've got the Error Action. The Directory Scanner is a little bit different to the other activity types. The other ones tend to have one system pumping data into your workflow. If something goes wrong, it's really for the other system to send you that data again. In the file system, that's just not possible. If an error happens, how do you tell the file system to reprocess that message? You can't, so what we've got for Directory Scanners is an error action by default that will Stop the workflow and allow you to come and look at it. Probably not great for production but great for testing, so you can see it stop and you can go back and have a look at what happened.
It can Retry until successful. Now that's good for, connections across a network, the network goes down, you're going to want to retry until it can get that message through particularly in mission-critical systems. Do be aware though it is going to retry. It's going to be processing over and over again if an error goes wrong. If that's the type of error that cannot be resolved, so maybe it's writing to an invalid file name for instance. No matter how many times it retries, it's going to get the same error still. So often you would only put it onto retry until it fails on a live production system where you've thoroughly tested your workflow. You've made sure it's going to work correctly in all scenarios. You can still, however, while it is retrying go into Integration Hosts and just stop the workflow if things are going wrong, so it's worth keeping that in mind.
Move the file to a directory; that's a great one. Sometimes data could be funny, so you have a directory that you can have as a sort of a retry director. So that allows you to say, "If this file falters, we're going to shift this file into this other directory." And then you can monitor that directory yourself, maybe even with a different process. And re-handle, maybe send off an email to you if things are happening wrong or something like that, that's up to you. But what it does allow, is for the rest of the messages to continue being processed. If you've got a system where you have one bad message stops the whole thing from processing, but all those other messages should be processed. You're going to want to turn this into a move file to a directory. It takes the error of the problem, puts it aside and then allows everything else to keep going.
Finally, there's Delete file. Makes sense for some systems, perhaps temporary stuff that is not critical at all. Once the file errors, it just shifts it out the way, it deletes it, and it's only going to continue processing the rest of the files.
I'm going to set that to 'Move to a directory' and then I can place in another one.
I will put in this c:\temperror\. So that is your directory scanner. Let's have a look at what happens when we now write out that file again.
I'm going to add another activity, and I'm going to set this to a File Writer. Again, the same thing applies. If I don't give it a name, it will use out the path that I type in to generate the name. In this case, we're not providing it with a directory; we are providing it with a full file name. I'm going just to call it 'file.csv.' So that's great! It's going to write out, and it's going to call it file: c:\tempOut\.csv. There's a problem with that. It's always going to be called file .csv. Let me discuss a few of the ways around fixing this problem, so you're not still writing the same file name every time. We are writing is a CSV, so I'm just going to change that across to CSV. CSVs support multiple records per file; in this case, the max records per file is set to 5000. It's going to write 5000 lines of data into that record before it then tries to move on and it's going to do that. After 5000, it would then move the file to another directory.
If you don't move it to another directory and you've set it to a fixed file name like this. It's going to end up just overwriting itself every 5000 records. You don't want to configure it that way, so you want to make sure that you move it to another directory after processing. I'll call this one c:\temp\out.
There's my directory that I'm going to be writing it to. And the great thing about that is when it does move the file to the processing directory, it's going to make sure that the file name is unique automatically. It didn't matter that I've given it a fixed file name, you could always consider this to be a temporary file, and then when it moves it into the c:\ temp, it's going to append some characters on at the end. It will guarantee that that filename is unique, but it will change it. However, there was another way that I could have made sure the file name was always created uniquely. I could use a value from the workflow. The first thing would be, perhaps I could use the time.
Let's have a look at that. I could change the file name. I right-click in the right place, say 'insert variable'. And then I could say the current date and time.
That's now going to put the current date and time into the file. It's going right at the c:\temp\file and then it will be you know the HL7 format year/month/day, down to the second .csv. So that's great!
Now I've got a unique file name, but of course, that's only valid every second. If you write multiple files per second, then it's going to be no longer unique. I don't recommend using the current date and time. Instead, what I'd like you to use if you want to make it unique is insert variable, use the WorkflowInstanceId. Now the WorkflowInstanceId, it's just an id that increments with each iteration of your workflow, and so that will guarantee that you'll have a unique file name.
You're not stuck with just those values. You are welcome to also take values out of your incoming message. I can go down to the patient's name and maybe the family name, for instance. I'll just drop that in there, and now it will also have the family name appended. That could be helpful if you want to make the files just more identifiable.
Incidentally, now that we've got it like that when it moves it to the output directory, it's already going to be unique, so you're not going to end up with a newly generated file name unless there was some conflict of a previously run system. Now all that makes good sense for CSV, but if it was an XML file, JSON files, you aren't going to be wanting to write multiple records per file. It tends to make sense just to write it as one record per file and then that way you'll end up with a newly generated file each time. So that applies mostly to XML and JSON files. But with HL7 it's up to you, sometimes people prefer them as single records per file, and sometimes they prefer to have maybe a day's list of files, and it's all just supported. The HL7 format kind of handles either one, it doesn't matter. But CSV wants a big number in there.
One more thing I wanted to point out with the moving the file to another directory after processing, I think this is an excellent practice if you've got another system that's going to pick up and process your file afterwards. And the reason is so you can create your file in a temp directory. It can be built up and then only once it's completed, is it then moved into another directory. And then the other system can then process it in its completed state. If you don't use that mechanism particularly when you're writing out say 5000 records into the file, it can take a long time for that file to be built up, and you don't want your other system to detect that file, grab it and then start processing it too early. You're going to end up losing records so always make sure if you are processing it by another system, setting this to make it move into another directory when you finish processing. If you're just archiving the data, it doesn't matter. You don't need to do it that way. You could write it straight into the correct directory.
Now we've got to build up the data that we're writing out, but by default, it was bound of course to the Incoming message because we are changing the types. I will delete that. And now I just need to build up my CSV. I can provide it with the headers for it, or I can just drag in the fields I want. I'll just drag in the patient's id, the family name. Notice it's just putting out the commas for me, makes my life a bit easier.
Now we've got a very simple CSV, and I probably want to give it some Headers too. I've got an id, first name, last name, and I'm just typing these out the way I'd like it to appear in the file. The header line is just a comma-separated list of header names.
I'm going to head back up to the top now. You'll notice that it's got not just the variables in it. The name is looking pretty ugly, so I might just want to change that to write to c:\temp\out and that just gives it a better indication of what this activity is doing. The variable names, obviously they can't be replaced into the name of the activity. In that case, it's probably better to type them in.
Hopefully, this is going one some way to help you using HL7 Soup Integration Hosts to read in files, to process them, to convert them out and show you some of the pitfalls that you can encounter when dealing with files. Making sure that you're not overwriting the name of it, making sure that you give it a unique file name, make sure that the other processing systems don't interfere.
As usual, if we've helped you, please give us a like maybe even consider subscribing. I'd love to hear your comments, any thoughts of what you'd like for us from the future.