Tuesday, May 1, 2012

xml anonymization

You might already have been in the situation, where you needed to ask very specific question at some public forum/mailing list that is however coupled with particular XML content.
And possibly the content should not be seen by public.
I needed to, so I checked the options.

How can I anonymize xml?
Google pointed me to - xmlanonymizer (http://code.google.com/p/xmlanonymizer/)
After some stuggling I even found the way to use it.

To get the very latest version, you need to import svn hosted source and build yourself:
- import svn project to eclipse (http://xmlanonymizer.googlecode.com/svn/trunk/), I imported it as Java project afterwards
- Configure -> Convert to Maven project (you need m2e plugin in eclipse installed to do this)
- build as maven project (for example via cli: mvn clean install)
- anonymize your content:
java -jar target/XMLAnonymizer-0.0.2-SNAPSHOT.jar in.xml -outfile=out.xml -overwrite=true
 where in.xml is the xml you need to anonymize and out.xml is the output file

OK, so now you (should) have it running.
 As per default only xml text and argument values are anonymized

But what if you need to go a step further?
I summed up my needs in the bug report there (http://code.google.com/p/xmlanonymizer/issues/detail?id=1):
- I'd need to have also element names anonymized
- I'd like to do anonymization via random digit/char rather than next digit/char in the sequence (as for the xml with maaaaany elements I came to situation, that there were no more element names left => endless loop)
- I'd like to keep anonymized character case sensitivity (lower/upper case)

and finally attached the patch, as the implementation was quite easy



No comments: