Couldn’t leave well enough alone over the weekend, so I poked at the Unicode situation. Much ranting on social media later, I think I’ve figured out what’s happening, and for a change it wasn’t my own fault.
My Mastodon display name is Karen C(pretzel emoji)(old woman)(pine tree)(ocean). It’s geographic: there’s Philly, there’s me, there’s the Pinelands, there’s the Atlantic. (It used to be pine-woman-ocean, and I kind of miss the ocean a lot, but I also like being closer to the city.) It’s also problematic for Wirebird at the moment.
Sometimes it came across as a string that was pretty clearly double- or
even triple-escaped, but decoding it would sometimes result in emoji
and other times result in an error because it was trying to over-decode
things. The inconsistency was a little puzzling, so I threw in some
I’m using XML::Feed to process both RSS and Atom feeds. It’s a wrapper around XML::RSS and XML::Atom which are by two different authors (neither of which is the XML::Feed author), and XML::Feed papers over the inconsistencies… most of the time. But Mastodon’s RSS feeds are not coming in the same way its Atom feeds are.
The solution seems to be to give the two ::XMLFeed libraries a subroutine that checks for the flag, and if it’s not found decodes the string (and sets the flag).
It’ll take looking at a lot more feeds to make sure this is working as I expect; Mastodon seems to be doing everything right as far as http headers and internal encoding, and not all feeds will be as well-behaved.