For quite some time I managed to keep the number of “construction sites” in PDCLib to a minimum. Sure, there were plenty of unfinished parts (floating point, multibyte / wide characters, locales, …), but my actual work was focussed on one part of the library only.
Unfortunately, I have strayed a bit from that path, and ended up with more “action items” than I am comfortable with. That is why I opened this “drawing board”, to write down my thoughts about all those “construction sites”, getting them organized, and make the path onward a bit clearer.
With later versions of glibc now finally supporting <threads.h>
, we can expect to see software emerging making actual use of this header (instead of <pthread.h>
). However, when conducting some more involved tests with my implementation, I also found a couple of severe defects. (The thrd_exit()
/ thrd_join()
return value handling is broken, for one.)
With <threads.h>
we get the ability to handle thread-local storage. While I implemented a thread-local errno
(which was simple enough), thread-local locale handling might turn out to be a bit more complicated. It might require initializing things, and we don't get to call functions from _PDCLIB_stdinit.c
…
The idea was to write a function _PDCLIB_load_lc_<category>
for each locale category (collate, ctype, monetary, numeric, time, messages). This worked rather well at first. For ctype I delved into Unicode, getting the “right” character classes directly from the Unicode database (auxiliary/uctype
).
Then I wanted to do the same for collate (sorting equivalence), and this was where I got stuck. Unicode collation is a pretty big subject in the Unicode standard, and information about it is scattered over multiple chapters, even multiple documents. In a kind of repeat performance of the block I had with <stdio.h>
, I did not find the necessary uninterrupted time to really grasp what was before me.
The thing to do here would be to identify which data from which Unicode input files I would need, in which format, in order to implement (initially) strcoll
and strxfrm
. Ideally, whatever architecture I come up with would also serve for (upcoming) multibyte and wide character collation.
I picked up the IANA reference implementation, tzcode, and am working toward an integration of that into PDCLib proper. This will do the time zone and leap second handling; it will also mean I will not have to maintain a separate database for time zone data. The Olsen database provided by IANA will do.
At this point the code is made to compile correctly, and roughly tested. There is quite some refactoring to be done yet, before the tzcode feels “native” to PDCLib (which is my goal here, as opposed to dlmalloc which I wanted to stay as untouched as possible).
For asctime
/ ctime
I need alternative access to the “C” strings in the time locale category, because they are both defined locale-independent.
A request from downstream was to add FP support to my printf()
implementation (which currently breaks for %f/%g et al. because it doesn't draw the accompanying value from the stack – not nice!).
I got a good introduction to the Dragon4 binary-to-string conversion algorithm as well as the paper for the Grisu3 small integer optimization. Big Integer support is required for there, and already implemented, so there might be another checkmark added here soon ™.
Links:
Annex K of the 2011 version of the C standard defines a number of optional library expansions. These revolve around thread safety and buffer safety. They are generally thought to make a rather poor job of it (they very much carry Microsoft's handwriting). However, when I was asked to provide a reentrant version of `strtok`, I opted for providing the Annex K version (`strtok_s`) instead of opening the can of worms that is POSIX extensions (`strtok_r`). While the latter would admittedly have been the better solution, I simply did not want to go there, as I would have hung myself with a partial POSIX support forever. And `strtok_r` can be implemented in terms of `strotok_t`, providing an acceptable solution for the immediate demand.
However, for now I am stuck with a partial Annex K support as well, which is a shame because there are mechanisms for the library to signal Annex K support to the application programmer, and right now this mechanism in PDCLib is lying, one way or another. I'd like to make the support complete, regardless of the poor design of the whole Annex K.
There has been quite some ad-hocery going on with the test drivers lately. Coverage is not nearly as complete as I would like it. Many functions are without testdriver altogether, others (like the Annex K functions) have made exceptions for `REGTEST` because general library support was not available – but they gloss over that quietly where they should probably report `NO_TESTDRIVER` or somesuch. I will have to go over all the tests once more to take out the slack.