Thursday, July 31, 2014

BizTalk Toolkit 2.1 ESB Add Namespace Pipeline Component and Encoding

Here is something that tripped me up the other week.  I was working on a Proof of Concept (POC) with BizTalk and writing XML messages to a RedHat Linux Network File Share (NFS).  To connect to the NFS, I followed a great blog post How to connect to an NFS with Windows Server 2008 R2 by Randy Aldrich Paulo.  Anyway, what tripped me up was the encoding of the files that were written to the server.

On send ports, I usually use the "Out of the box" XMLTransmit pipeline component.  However, this particular POC had a requirement to strip all namespace prefix tags from the XML elements contained in the message while retaining the target namespace on the root element.  I thought, no problem, the ESB Toolkit has two pipeline components I can use to solve this.  Specifically, I created a pipeline component that used the ESB Remove Namespace Pipeline Component and the ESB Add Namespace Pipeline Component.

So I get the send port created and assign the new pipeline component that satisfies the requirements:
  

I run a test through and everything looks fine from a BizTalk perspective.  I inform the Linux Administrator that a new XML message is available for him to review.  He confirms the receipt of the file and I think life is good.  

I get a call back from the Linux Administrator after about 10 minutes with a problem.  Apparently, the file being sent has some special characters ‘��’ at the start of the XML.  I think to myself, that is strange, maybe it is Byte Order Mark (BOM) related.  I change the property "RemoveBOM" on the ESB Remove Namespace Component and set it to "True", thinking this was the problem:



So I run another test through and inform the Linux Administrator that a new file is available.  He confirms the receipt but on inspection of the file the special character problem persists.  Furthermore, he follows up with the fact that the encoding on the XML file is UTF-16 when they are expecting UTF-8.  Again, I think to myself, shouldn't BizTalk be sending this message in UTF-8?  

I go to the Tracked Message in BizTalk Administrator for the send port and look at the MessagePart:



So the Character set on the message looks to be "utf-8".  I go back to the Linux Administrator and argue a bit with him until he finally sends me the output file from Putty showing the file properties.  One of those properties being the encoding which was set to: Little-endian UTF-16 Unicode text.

So me being my hard headed self set out to prove the Linux Administrator wrong.  I find a cool utility on CodePlex called File Encoding Checker and download the tool.  I create a dummy send port (using the new pipeline component) that outputs the file to a local folder on my development BizTalk server.  I check the encoding and this is what I find:





What???  How could this be?  I check the documentation for the ESB Add Namespace Pipeline Component at msdn.microsoft.com and I can find no information about what encoding is used.  Apparently, the ESB Add Namespace Pipeline Component defaults the encoding to UTF-16.  Additionally, the pipeline component has no property for setting the encoding (although the Remove Namespace Pipeline Component does).

In order to resolve this problem, I had to create a custom pipeline component which adds a namespace to the root element and uses UTF-8 to encode the output.  An interesting lesson on the Add Namespace Pipeline Component and something to keep in mind when leveraging it.  It would be great if the Add Namespace Pipeline Component could be updated with a property to set the encoding, just like the Remove Namespace Pipeline Component.

No comments: