Different ways to keep your Interface from processing duplicate files

Different ways to keep your Interface from processing duplicate files

One of the most repeated requirements concerned with interfaces implementing Source File Adapters is to some how keep the interfaces from processing duplicate files.

Recently we faced a similar requirement at our project. Our goal was to implement a solution that would be generic and easy to implement. Below are a few of the best solutions that we identified. These solutions can be implemented independently or certain aspects of each solution can be picked and merged together to provide a *better* solution.

The author is fully aware that there are other ways to do this .

Solution 1: The good old Adapter Module

A search on SDN threw up an excellent article by Sandeep Jaiswal.

To implement Sandeep's approach, we had to make some changes to suit our requirements. We decided to archive the file instead of deleting the contents of the file using the ArchiveFile() function (as opposed to DeleteFile()):

public void ArchiveFile(String FileName, String ArchiveLocation)throws Exception {
try {

File myFile = new File(FileName);

File archiveDirectory = new File(ArchiveLocation);
myFile.renameTo(new File(archiveDirectory, (myFile.getName() + "_duplicate_" + new Timestamp(new Date().getTime()).toString().replace(':', '-'))));
}
catch (Exception e){
throw e;
}
}

This will only work when the file system is NFS (not FTP). The archive location will be received as a parameter to the adapter module :

From an XI architectural perspective, it is correct for us to try to stop the processing of the duplicate early in the Adapter Engine (which is what this implementaion does).

This adapter module was easily the most generic/easy to use solution to the duplicate file problem. But if I had to be very critical I would have to point out the following *flaws* :

1) It is not a good practice to use File IO from within the Adapter module. As a matter of fact, it is not a good practice to use File IO from any Java Bean. Using Java IO from anywhere within XI (adapter engine/integration engine) can be a serious performance hit. We will come back to this point later in the article.

2) The error/alert messages that are generated when the duplicate file is encountered aren't as expected. i.e. We never actually see the error that is thrown in the code :

throw new ModuleException(fileName.trim() + " has already been processed by SAP PI");

Instead we see the uninformative:


In our effort to be perfect we looked for alternatives.

Solution 2 : Why not in Message Mapping?

An UDF can be implemented in message mapping to check if the file has been processed. The function DuplicateFileCheck() will accept the database file name (file which will hold the list of processed file names) as a parameter. If the database file does not already exist, it will be created :

String processedFileDatabase = processedFile[0];
String sourceFileName;

DynamicConfiguration attrib = (DynamicConfiguration)container.getTransformationParameters().get(StreamTransformationConstants.DYNAMIC_CONFIGURATION);
DynamicConfigurationKey fileKey = DynamicConfigurationKey.create("http:/"+"/sap.com/xi/XI/System/File","FileName");

attrib.put(fileKey,attrib.get(fileKey));
sourceFileName = attrib.get(fileKey);


File fileDB=new File(processedFileDatabase);

if (!(fileDB.exists() && fileDB.canWrite() && fileDB.canRead())){
fileDB.createNewFile();
}

Vector fileNameList = new Vector();
BufferedReader br = null;
br = new BufferedReader(new FileReader(processedFileDatabase));

String name = new String();
//loop and read a line from the file as long as we dont get null
while ((name = br.readLine()) != null)
//add the read word to the wordList
fileNameList.add(name);
br.close();

boolean fileAlreadyProcessed = fileNameList.contains(sourceFileName);

if (!fileAlreadyProcessed) {
Writer output = new BufferedWriter(new FileWriter(new File(processedFileDatabase),true));
output.write(sourceFileName + "\r\n");
output.flush();
output.close();
}

result.addValue("" + !fileAlreadyProcessed);

Implementing the code as listed above re-introduces Java IO and hence the performance hit described in solution 1. An alternative to using the Java IO would be to access a table through the dynamic configuration class. The table will hold the list of all processed files (instead of the file).

In graphical message mapping, only the target root node should be mapped using the DuplicateFileCheck() function as follows:

This implementation gets the job done for *most* of our interfaces. The problem is that this approach requires a Message Mapping to exist, which was not the case for all interfaces. For example, we have a couple interfaces which are only concerned with moving *.zip files from one location to another. Another problem with approach is we cannot save or keep a copy of the duplicate source files since the adapter engine would have already archived/deleted it.

Now before we make XI conservatives cry anymore (yes.. we know that this doesn't belong in the integration engine).. We'll move quickly on to the third solution.

Solution 3 : The flexible Receiver Determination

The third approach requires both an adapter module implementation and some changes in the configuration scenario. Within the adapter module a check will be done to see if the source file is duplicate or not.. if the file is duplicate a new XML Payload will be created for the message which will hold the following :


The file mysourcefile.txt has already been processed by SAP PI

The code to build the new Payload is given below :

XMLPayload xmlpayload = msg.getDocument();

if (flag == true) {
Audit.addAuditLogEntry(
amk,
AuditLogStatus.ERROR,
"NewDuplicateFileCheck:File"+ fileName+ " has been already processed by SAP PI");
DocumentBuilderFactory factory;
factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

Document document=builder.newDocument();

Element rootElement = document.createElement("Root");
Element childElement = document.createElement("DuplicateFlag");
childElement.appendChild(document.createTextNode("File "+ fileName.trim() + " has already been processed by SAP PI."));

rootElement.appendChild(childElement);
document.appendChild(rootElement);

TransformerFactory tfactory = TransformerFactory.newInstance();
Transformer transformer = tfactory.newTransformer();
Source src = new DOMSource(document); // DOM source is our DOM doc
ByteArrayOutputStream myBytes = new ByteArrayOutputStream();
Result dest = new StreamResult(myBytes);

transformer.transform(src,dest);

byte[] docContent = myBytes.toByteArray();

if(docContent != null){
xmlpayload.setContent(docContent);
msg.setDocument(xmlpayload);
inputModuleData.setPrincipalData(msg);
}


//throw new ModuleException(fileName + "is already processed");
}




Note that we will no longer be throwing the Module Exception from within the adapter module.

In the receiver determination we will check for the DuplicateFlag node using Xpath Expressions and we will route the message to different receivers based on the existence of the flag:

We configure a mail adapter which implements the MessageTransformBean as follows :


This results in a nice e-mail message displaying the name of the duplicate file that was already processed :

The drawback with this approach is that once again we cannot control how XI handles the duplicate file (it will be either archived or deleted based on the sender adapter settings).


Solution 4 : Enhanced Receiver Determination - In theory only

Unfortunately we do not have code or screenshots for this solution.

This approach involves using the Dynamic Configuration class to maintain/check a table holding a list of all files that have been processed by the interface. Based on whether the file is duplicate or not a different receiver is identified (similar to solution 3).

Again, since this will be done through enhanced receiver determination, we cannot control what the sender adapter does (has already done) with the duplicate file.


SAP Developer Network SAP Weblogs: SAP Process Integration (PI)