The DICOM library has ballooned somewhat since last time :/ On the upside, the basic parser can now handle both of the simplest little-endian encoding schemes, all the 30+ possible value representations (VR - tag content type) and even nested sequences (both explicit and undefined lengths). I need to think about the exposed API and how to make it usable and also about how to get rid of the inelegant staircases of if-then-else in some methods. Ugly but it does actually work. This both surprised and pleased me the first time it ploughed, without error, through a 40MB multiframe file chock full of nested sequences and dumped it in readable form onto the console.
The library is now split out into modules with defined purposes (bytestream parsing, data dictionary, pretty printing and file reading) and only a very small amount of code has to live in IO. The rest of it is pure, albeit in fairly imperative form.
The implicit little-endian encoding is going to cause me a great deal more typing. Unlike the explicit LE encoding which supplies the VR in the tag itself, the implicit encoding expects you to just know what VR corresponds to each numerical tag. The only solution I know to that is to build a dictionary with the numerical tag as the key. This works as expected with the side-benefit that I can store the tag's textual description along with the VR for use when pretty printing. I do wonder how time consuming this will be to construct every time the library's used. There are a huge number of tags defined in DICOM of which I've used 300 and they took long enough to enter. Perhaps that is the price for trying to decode such a huge standard :/
A tough test to come is whether all this data can be successfully re-encoded into a bytestream then a file and read by an existing DICOM library. That will have to wait though, I need to see what I can do with the in-memory representation and that means it's time to build a UI app to look at pretty pictures in. wxHaskell here we come...