Here I jot down thoughts, roadmaps, to-do's and other things related to PDCLib. Newest entry first.
This past month seemed a lot longer than a month. There had been productivity elsewhere, new professional challenges, and some private heartbreak.
No progress on the library though. I hope this is excusable.
The reimplementation is online. Please do pull the new version. I am a bit embarrassed at the poor quality of the previous attempt, and how long it took me to actually realize.
Turns out my printf( “%a” )
support was (is) not only inefficiently implemented, but also bugged in multiple ways. I am in the process of reimplementing the whole thing in a bit more robust way.
I decided to shut down the Subversion repository (which had been the master repo until now). I still consider Subversion to be the better option for a small project like PDCLib, but I guess it is time to move on. I basically need the practice with Git, so… yea.
Technically it is not that much, but it feels like a huge step forward – PDCLib now supports printing floating point values using the %a
conversion specifier.
Why this weird specifier that is not used by anybody out there? Because it is the one that works without changing the base of the mantissa, i.e. this is the one format that avoids all the issues of the other FP conversions. And having the ability to print some kind of FP output will help immensely when debugging the other conversions.
Back in the groove with a new employer. In pursuit of floating point support, I pulled apart the rather monolithic _PDCLIB_print.c
and did some cleanups.
Big integer support is mostly done,so I can try my hand at implementing the Dragon algorithm for float support in printf()
.
Sorry for the long silence. An opportunity has opened up for me employment-wise which, however, requires my full attention.
I hope to return to PDCLib by mid '21.
I was asked to add floating point support to my printf()
function family. I have been looking into the Dragon4 algorithm, which seems to be “the thing to do” here. This requires some bigint support for the high-precision conversions; I have started to add functions to that end.
Two functions in stdio() (fread(), fgetpos()) did not handle ungetc()'ed characters correctly. Fixed.
I erased my previous work on tzcode, and started anew. This time, I kept the original mostly untouched for the initial setup (instead of trying to refactor major parts of it as I go, the way I tried in the first go). This means that, at this point, I got a lot of code in the PDCLib repo that is… well… unkempt. Also, no documentation. But the <time.h>
functions work now, even if the testing is rudimentary. For one, the local time function tests will fail in different time zones, because I cannot set a specific time zone for the test yet…
This is a dirty hack, but it gives me a base from which to refactor functions/_tzcode
till it feels like a true part of PDCLib.
The last two months had been… unsavory. I had a lot of things on my hands, and unfortunately had to drop the ball on PDCLib for some time.
I've returned to the keyboard though. The current work will take some more polish before being checked in, but I am confident that I tackled the tzcode issue from the correct angle this time.
Well, that was to be expected. In my effort to untangle the internal data flows of tzcode, I have painted myself into a corner. Nothing serious really, but something that requires a couple hours of uninterrupted focus. Which is hard to come by currently… I hope to get this done over the long weekend.
I've made inroads on tzcode, using it as a basis and reference for reading the Olson database for time zone and leap second information, in order to (you guessed it) provide actually functional implementations of the <time.h>
functions.
As opposed to dlmalloc
, which I assimilated more or less unchanged into PDCLib so it serves as its malloc
/ free
implementation, tzcode requires a bit more work. I am not adopting the code, but more or less am rewriting the functionality along its general lines. I might get into some more detail on the why and how when I am satisfied that I am on the right track; right now I am still testing the waters so to speak.
Progress has slowed a bit toward the end of the holidays; both work and real life have caught up with me again. But my partner has shown a remarkable interest in what I'm doing with PDCLib, and I guess that will keep me doing it whenever time allows.
See that strike-through text in the previous entry? I went ahead and implemented that change anyway, because I was a bit fed up that types kept being “a bit off” whenever I switched platform. For the record, I am working on-and-off on either a desktop, a netbook (both x86_64 Linux), a Raspberry Pi (ARM Linux), my mobile (ARM Android, and yes, I am actually working on that at times), and more recently I got a Windows laptop (for home office work) which allowed me to double-check x86_64 Cygwin, MinGW 32 and 64bit compilations. Which had apparently stopped working some time ago because the types weren't set up correctly by _PDCLIB_config.h
. It was time to do an overhaul of the whole type handling.
One thing that had bugged me (pun intended) for a long time was that I originally implemented the leastN_t
types in terms of the exact-width intN_t
types. That was bass-ackward because the latter are optional and the former aren't. I also did rely on _PDCLIB_config.h
being set up “just right” instead of using compiler predefines, and of course manual setup gets it wrong from time to time.
So I sat down, wrote little test programs, and ran those on all the platforms at my disposal to figure out what was actually required. I also made an overview of what GCC / clang provided (which is almost identical across platforms and compilers, but not completely). This was not only for the types mentioned above, but (because that was what I was originally working on) for clock_t
and time_t
as well. For obvious reasons, these have to fit the types used by the platform API.
In the end this necessitated a complete rework of all the files affected – _PDCLIB_config.h
, _PDCLIB_int.h
, stdint.h
, and inttypes.h
. It took a while to figure out what actually belonged where, and how the logic could work out, but I think I got it right eventually. Should any problems occur at your end due to this change, please tell me so I can adjuct the screws.
But right now it's 4 AM. I am happy this is checked in, but I am even more happy to go to bed now. I guess I will take a break tomorrow (today?) and enjoy a day of real vacation for a change.
We had a one-hour power outage this morning… and it took me another hour to figure out I had misconfigured the server so it didn't spin up on its own after the power came back. Sorry.
On the other hand, I'm chunking away at the <time.h>
implementation, using IANA's reference implementation (which is public domain) as a guide on time zone / leap second handling.
While I am at it, I'm making some changes to internal plumbing as well, reducing the number of (Postponed, this turned out to me more of a change than I was willing to do on the side.)
#if
guesswork in _PDCLIB_config.h
in favor of using GCC / clang predefines – hoping to finally get PDCLib to compile properly on all my test platforms, including Cygwin, MinGW, and the occasional Termux compile on the road.
Between Stefan Schmidt's contribution of a gmtime()
contribution I promised to review and the sorry not-even-half-implemented state of <time.h>
I decided to make that my next “action item”.
Stay healthy, stay at home, meet you on the flip side.
New priority is <threads.h>
. During some tests related to thread-local errno
support (which is now implemented), I found some serious flaws in my implementation, most importantly handling of result codes (thrd_exit()
/ thrd_join()
) and failure codes. This should not be too hard though.
I pushed the reworked freopen()
(and flanking work) to git just now. There is a lot yet to be done (like proper setting of errno
), but at least this rework should stomp on various errors lurking in the old freopen()
.
Sorry for keeping quiet for so long.
There has been activity in the repo, I just didn't find the leisure to make a blog post. There had been fixes to the *scanf()
functions, and a lot of peripheral work regarding the freopen()
rewrite, which hopefully improved overall code quality.
I am in the last throes of freopen()
; the new code is done, but I got a bug affecting stdin
reopening, as used by the *scanf()
test drivers, which is why I haven't commited that work yet.
I got an implementation for gmtime()
contributed by downstream, which I will review ASAP (but no sooner), and then I guess I'll get cracking at one of the numerous other construction sites, with an eye on getting the number of such construction sites down so this doesn't “feel” so bad anymore.
Not much to report as I have been focussing on private matters (including a renovation project).
Another downstream request was to implement floating point output to my printf()
implementation. I've had a look at this excellent presentation of the Dragon4 algorithm, as well as Florian Loitsch's paper on the Grisu improvement, and found my spontaneous reaction of “definitely not at this point in time” to be justified…
But I took note of the state of the art, and will study it when the time comes…
Coming back to my “to-do list” from early June, things are a bit ugly.
<time.h>
is only partially implemented. All the date functions are awaiting locale support (for figuring out time zones, and leap second handling).<stdio.h>
and <time.h>
are yet missing. Annex K <time.h>
faces the same issue as regular <time.h>
– locale support.<threads.h>
still isn't thoroughly tested. Integration of thread-local storage is missing, which makes handling of errno
, locale etc. non-compliant at this time.freopen()
still isn't fixed.memcpy
/ strcpy
functions, but I am afraid to open yet another can of worms.<math.h>
looming on the horizon.I feel a bit overwhelmed at the moment, as there are so many construction sites, and no easy way to reduce their number anytime soon.
I guess fixing freopen()
is the easiest among them, but then things get… interesting.
My exit code – the one handling process termination on exit()
– apparently never worked. Streams did not get flushed and closed, buffered output got lost. This is not good, and I profusely apologize to all.
In fixing that particular bug, I came to realize that, while exit()
now does what it is intended to, a return from main()
still doesn't. Apparently I never actually solved the issue. Until I do, be warned: A return from
main()
, at this point, does not close open streams, or indeed call any of the functions registered with atexit()
. I have to figure out how to make this happen.
Note that it is the duty of the C runtime support code – the part that actually calls main()
– to call exit()
with what main()
returned[1][2]. PDCLib does not come with C runtime support code, as that is platform specific. I should have a line or two about that in my Readme…
I finished strtok_s()
, and am having a look at the remaining Annex K functions… those in <string.h>
are easy enough, but I shudder a bit at the thought of diving into the *printf()
implementation to get the bounds checking implemented…
As a summary update, my to-do list:
strtok_s()
as per C11 Annex K (due to popular demand)freopen()
<threads.h>
implementation<locale.h>
et al.
Turns out the fseek
issue was easily fixed, thanks to the high-quality bug report. Take that for a Monday.
Yes, I am aware of breaking bugs in the current PDCLib. There's something wrong with the thread implementation still, there *might* be some problems with the dlmalloc integration, and I also know of issues with freopen
and fseek
. It's a bit overwhelming right now – I thought I'd be looking at a mostly functional build until a simple test program convinced me otherwise…
But I am definitely working on it. Especially since that fseek
issue has been brought to my attention by a group of PDCLib early adopters, which I am rather keen to support, as they have provided me with valuable feedback in the past.
So… for now, you're probably better off to look to SVN revision 769 (pre 2019-04-16) if what you want is a halfway-stable, functional PDCLib (that's using a makeshift memory management and is strictly single-threading). You'd still have to accept the problems with freopen
(possible resource leaks) and fseek
(probably completely broken, I'm still looking into it).
So much to code, so little time…
Adapting my <threads.h>
solution from x86_64-Linux to Raspbian Linux went surprisingly smooth (despite the jump scare I got the first time around when I forgot to adjust the settings in _PDCLIB_config.h, as you can see from the repository log…).
Then I tried to adapt it to Windows / MinGW, just for the sake of giving it a try, and… oh, my. OK, there has to be some more work poured into this.
(Among other things, Windows / MinGW does some things very differently in pthreads.h, most importantly the data structures not being data structures at all but typedef
ed void *
, so most of what I did in pthread_readout
does not help – instead it gets very much gets in the way.)
Ah well. We're further down the road than we were a week ago, so all is good I guess.
Back from vacation, and got around to commit the <threads.h>
implementation.
I know of the following shortcomings at this point:
<errno.h>
needs to be thread-specific storage; I am thinking about how to initialize things that way.freopen()
is flaky, probably broken in more than one way. I am working on that, but wanted the rest of the code committed right away, for backup purposes if nothing else.
On the upside, most of <stdio.h>
(with the exception of aforementioned freopen()
) is thread-safe, as are the memory management functions.
Enjoying two weeks of vacation at the North Sea, I spend quite some time relaxing at the keyboard. (Yes, this can be actually relaxing, if you go at it the right way.)
I integrated dlmalloc (using default settings only for the time being), and am making some progress toward implementing <threads.h>
. That was not at all on the to-do list, since it's C11 and I claimed that as being out of scope until I got C99 covered. But as I received feedback from several adopters of PDCLib, and the subject of multithreading support popped up in almost every single one, I bowed to popular demand.
The example platform will implement <threads.h>
as a wrapper for pthread, but it should be comparatively easy to come up with other adaptions. Note that contributions supporting other mainstream APIs and / or platforms will always be welcome!
It's also simpler to implement those pthread wrapper functions than digging through the Unicode specs.
Once I got the functionality nailed down, I will wade through the existing code to implement thread safety as required. (Looking at you, <stdio.h>
…) I might add some C11 extensions while I am at it (strtok_s
was among the requested functions, and I do not see a reason not to oblige, really).
So… yes. Progress is being made.
It's been a long time since I last did anything with / for PDCLib, but I won't make excuses for it. I just could not get myself to dig into that Unicode standard again. And as I said to a fellow developer some time ago:
A hobby should always be a CAN do, not a TO do. Have a good hard look at what each of your hobbies is giving you, and be ready to drop hobbies that drain your energy instead of recharging it.
After the ePub debacle, and due to several other (private) issues, my energy was drained. So I focussed on more enjoyable things… but I'm back.
Since I still could not bear the thought of going full Unicode mode again, I had a look at integrating Doug Lea's ''malloc()'', properly this time, to replace the makeshift malloc()
/ free()
implementations PDCLib currently “offers”.
To do this with a minimum of changes to the dlmalloc()
code (desirable because easier to maintain facing future changes), that meant I had to tackle the issue of symbol visibility (dllexport
), which dlmalloc()
supports and PDCLib doesn't (yet).
That in turn meant I had to test the stuff, which in turn meant it was time to enable building PDCLib as a shared library instead of the static one it currently is. But that meant touching Makefile
… and that thing, while I liked its results, was not exactly a beauty to behold in an editor.
So I started working toward supporting CMake, which would bring several other benefits as well. And today I committed the first version of just that, so…
Let's see if I get back on track on this.
Quickly saving a link for later reference: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The ePub conversion was a dead end; I should have spent the time reading instead of working on “conversion to better readable format”. So now I am looking at wasted time, a reading backlog, and lots of things I neglected while working on the now-abandoned conversion.
*sigh*
Since I was asked, I thought I could just as well give the answer here:
Why are you doing<locale.h>
first? I would think floating point support would be more important.
Three reasons, really. The first is just a minor snag – FP I/O is locale-dependent (decimal point vs. decimal comma).
The second is that, to do the FP logic right (instead of naïve 80-20 solutions), you need to take lots of platform specifics into account. This will blow up <_PDCLIB_config.h>
significantly, and result in lots of rather ugly conditional code.
Third, it is quite simply the area I have the least expertise in. I want to save the hardest part for last.
Sometimes we find ourselves approaching new technologies from rather unexpected angles. Right now I am working on an ePub conversion of The Unicode Standard for easier reading, as PDF handles poorly on my tolino ebook reader.
I would probably never have bothered with looking into the ePub format if it had not been for PDCLib… we live and learn.
There is no way around it. Too much of the whole ctype, wctype, uchar, locale issue is pointing to Unicode all over again. And I have been cursing at getting tangled by lots of cross-references and internal dependencies, so now I made myself sit down and tackle the monster that is The Unicode Standard. From cover to cover, as there seems to be no real shortcut to “just what I need right now”.
So… yeah. Stay put.
In these past two days, I learned a lot about the Unicode Collation Algorithm. Yes, I can do this, I can make this part of the PDCLib.
But no, not in the immediate future. That will have to “make do” with the “C” locale.
I have added _PDCLIB_load_lc_*()
functions for all the locale categories mandated by C99, plus LC_MESSAGES
which is a C99-compliant POSIX extension which is required anyway for strerror()
and perror()
to be locale-aware.
The one thing left is LC_COLLATE
. Collation in the C locale is comparatively simple, but Unicode aware collation?
Let's just say that the corresponding Unicode document, converted to PDF for easier offline reading, amounts to 61 pages. I will have to dig through that at some point, so why not now.
Bah. Think first. There already is a function to load contents for the various locale-data structures from file, and it's name is setlocale()
.
Also, while loading from the filesystem is rather “raw”, any other mechanic will be even more “raw”, and less standard (as in, <stdio.h>
).
So stop dithering and make setlocale()
do more than return NULL;
.
Looking at what I already had in <locale.h>
, I decided some reworking was required. Stuffing everything into struct lconv
was not the smartest idea I had, so I did split things up into separate struct _PDCLIB_lc_*
. I also moved the extern
declaration of the actual data instances from <locale.h>
to <_PDCLIB_int.h>
where they are less confusing to the casual observer.
I am currently thinking in terms of _PDCLIB_load_lc()
to load contents for the various locale-data structures from file. I do not like the idea of having raw filesystem access inside PDCLib, though… this needs some pondering.
With get-uctypes
(the source of which is in the repo at auxiliary/uctype/
), I now have a program to get character classification information (as required by <ctype.h>
and, more importantly, <wctype.h>
) directly from data files available from unicode.org.
The shepherd
branch already had this functionality, but it was a) written in Python (which IMHO has no place in the source tree of a C library); b) including the raw data files which made them prone to getting outdated and required additional legalese added due to Unicode licensing; c) not giving correct results, and more importantly, not offering an easy way to test against the system library's results.
Now I have to provide a way to actually use the derived information in PDCLib proper.