Wednesday, August 31, 2005

Web-services: interoperability & encoding

I am currently implementing MSCRM at a customer that is also implementing an enterprise portal using SiteVision delivered by Objectware's Java department. The need for providing the SiteVision portal with information about MSCRM accounts made me implement a few .NET XML web-services to be used by the Java guys. The web-service provides basic read, create and update services for accounts and contacts. The 'data transfer object' was simply implemented as a string that would contain XML according to a predefined XSD schema.

The enterprise application integration server used is BIE (Business Integration Engine; Java open source), chosen by the Java lead developer on this project. Their use of my web-services went along quite smootly as was to be expected as .NET web-services complies with WS-I BP1.0 (intro article here). I had even used the WS-I compliance tools to check my web-services.

What caused a major halt on the integration implementation was when we started testing with data containing Norwegian characters (ÆØÅ). To be more precise, this was tested and found to be OK by using MSIE as the test-client, and by using XmlSpy to retrieve data about e.g. an account, modifying the data, and finally updating the account. I strongly recommend the XmlSpy SOAP Debugger tool both for testing and debugging web-serives.

The problem was that although BIE supports web-servies using UTF-8, it does not differentiate between UTF-8 as a text encoding in the web-service and UTF-8 encoded XML and the W3C character encoding rules that dictates that a unicode character must be encoded as a #xHHHH numeric character reference (i.e. 16-bit unicode code points/units; UTF-16 NCR). For more details see 'Can XML use non-Latin characters?'.

The MSCRM 1.2 object model supports and returns XML using the #xHHHH character encoding. All strings in .NET are unicode.

The character encoding went through these phases (examples for Æ and æ):

  • from our MSCRM web-service: Æ encoded as Æ
  • back from BIE on create/update: Æ "encoded" as Æ (not even valid UTF-8 code units)
  • data in MSCRM after operation: Æ
  • from our MSCRM web-service: æ encoded as æ
  • back from BIE on create/update: æ encoded as æ (which are valid UTF-8 code units)
  • data in MSCRM after operation: æ
To make the problem even worse, by running the same BIE test case (workflow route) over and over, these 'funny' characters would double each time, as each XML encoded character from our service was sent back as two UTF-8 code units from BIE and not as XML UTF-16 code units (the W3C character encoding standard).

A nice unicode character encoding test tool is available here.

The Java lead developer had run into these encoding problems in BIE and modified the portal code to do some character replacing (Æ to Æ, etc) on all text to/from BIE on their side, and I asked them to change their integration implementation to comply with the WS-I and W3C standards. Unfortunately, this lead developer is more interested in hailing the glory of the application architecture & design and the open source movement than complying with standards "invented by Microsoft" :-)

Thus, I had to write a SoapExtension to modify the incoming SOAP message before it was deserialized, to change the "wrong" UTF-8 encoding into correct XML W3C UTF-8 encoding. I used this GotDotNet sample as the basis for my code and this is how I modify the incoming and outgoing SOAP messages:

public override void ProcessMessage(SoapMessage message)
{
switch (message.Stage)
{
//INCOMING
case SoapMessageStage.BeforeDeserialize:
this.ChangeIncomingEncoding();
break;
case SoapMessageStage.AfterDeserialize:
_isPostDeserialize = true;
break;

//OUTGOING
case SoapMessageStage.BeforeSerialize:
break;
case SoapMessageStage.AfterSerialize:
this.ChangeOutgoingEncoding();
break;
}
}

public override Stream ChainStream( Stream stream )
{
//http://hyperthink.net/blog/CommentView,guid,eafeef67-c240-44cc-8550-974f5d378a8f.aspx

if(!_isPostDeserialize)
{
//INCOMING
_inputStream = stream;
_outputStream = new MemoryStream();
return _outputStream;
}
else
{
//OUTGOING
_outputStream = stream;
_inputStream = new MemoryStream();
return _inputStream;
}
}

public void ChangeIncomingEncoding()
{
//at BeforeDeserialize
if(_inputStream.CanSeek)
_inputStream.Position = 0L;

TextReader reader = new StreamReader(_inputStream);
TextWriter writer = new StreamWriter(_outputStream);

string line;
while((line = reader.ReadLine()) != null)
{
writer.WriteLine( Utilities.FixBieEncoding (line) );
}
writer.Flush();

//reset the new stream to ensure that AfterDeserialize is called
if(_outputStream.CanSeek)
_outputStream.Position = 0L;

}

public void ChangeOutgoingEncoding()
{
//at AfterSerialize
if(_inputStream.CanSeek)
_inputStream.Position = 0L;

Regex regex = new Regex("utf-8", RegexOptions.IgnoreCase);
TextReader reader = new StreamReader(_inputStream);
//HACK: TextWriter writer = new StreamWriter(_outputStream);
TextWriter writer = new StreamWriter(_outputStream, System.Text.Encoding.GetEncoding(_encoding));

string line;
while((line = reader.ReadLine()) != null)
{
// change the encoding only is needed
if(_encoding != null && !_encoding.Equals("utf-8"))
line = regex.Replace(line, _encoding);
writer.WriteLine(line);
}
writer.Flush();
}

The central method here is the ChangeIncomingEncoding method that converts all the "wrong" UTF-8 encodings from BIE into correct XML W3C NCRs. Note the resetting of the position to zero on the output stream after modifying the message; this is important as forgetting to do so will cause the AfterDeserialize step of the SoapMessageStage in ProcessMessage not to be called.

After deploying the modified web-service and testing it with XmlSpy, it was time to test it with the BIE workflow dashboard. The BIE developer had set up some test cases for me, and they all worked as expected. No more funny farm in the MSCRM database.

What an illustrious victory for Java-.NET web-service interoperability!

Note that you cannot test/debug your SoapExtension code by using MSIE as the test-client as HTTP POST/GET will not trigger the SoapExtension. I used XmlSpy to test and debug my code; just set some breakpoints, start your web-service in debug mode and leave it running, then trigger the SoapExtention by making a SOAP request using XmlSpy.

No comments: