Still kind of thinking about the library to use for processing Atom, RSS, and other non-RDFa XML.

Ended up expanding Wirebird::Remote::XML to play around with processing it manually, and at a first pass it works moderately well. Here is what I get when I run the Atom feed for my Mastodon profile through it.

This is what Wirebird::Remote::RDFa (via RDF::RDFa::Parser) gets. I haven’t quite figured out why Mastodon gets recognized as RDFa when Plerd’s feed doesn’t, but I haven’t fed it to a validator to see yet.

<https://mastodon.social/users/gamehawk.atom> <http://www.iana.org/assignments/relation/alternate> <https://mastodon.social/@gamehawk> ;
    <http://www.iana.org/assignments/relation/avatar> <https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png> ;
    <http://www.iana.org/assignments/relation/header> <https://files.mastodon.social/accounts/headers/000/007/209/original/media.jpg> ;
    <http://www.iana.org/assignments/relation/hub> <https://mastodon.social/api/push> ;
    <http://www.iana.org/assignments/relation/next> <https://mastodon.social/users/gamehawk.atom?max_id=8115759> ;
    <http://www.iana.org/assignments/relation/salmon> <https://mastodon.social/api/salmon/7209> ;
    <http://www.iana.org/assignments/relation/self> <https://mastodon.social/users/gamehawk.atom> .

This is what the current Wirebird::Remote::XMLFeed gets, parsing with XML::FeedPP and then dumping things into SIOC fields.

<https://mastodon.social/@gamehawk> <http://purl.org/dc/terms/created> "2018-06-14T23:26:25Z" ;
    <http://rdfs.org/sioc/ns#description> "Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her." ;
    <http://rdfs.org/sioc/ns#feed> <https://mastodon.social/@gamehawk> ;
    <http://rdfs.org/sioc/ns#link> <https://mastodon.social/@gamehawk> ;
    <http://rdfs.org/sioc/ns#name> "Karen C\U0001F968\U0001F475\U0001F3FB\U0001F332\U0001F3D6\uFE0F" ;
    a <http://rdfs.org/sioc/ns#WebLog> .

This is what Wirebird::Remote::XML can get, parsing with LibXML and doing a little hard-coded processing. It’s not using the feed’s namespaces yet, just the default ones built into RDF::Trine. Prefixes are not standardized, so this is a shortcut that should only be used with tame data, if then, but it’ll do for now. These should really be http://www.w3.org/2005/Atom rather than the p[ermanent]url RSS RDF::Trine guessed it as.

<https://mastodon.social/users/gamehawk.atom> <http://purl.org/rss/1.0/id> <https://mastodon.social/users/gamehawk.atom> ;
    <http://purl.org/rss/1.0/logo> <https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png> ;
    <http://purl.org/rss/1.0/subtitle> "Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her." ;
    <http://purl.org/rss/1.0/title> "Karen C\U0001F968\U0001F475\U0001F3FB\U0001F332\U0001F3D6\uFE0F" ;
    <http://purl.org/rss/1.0/updated> "2018-06-14T23:26:25Z" ;
    <http://purl.org/vocab/relationship/alternate> <https://mastodon.social/@gamehawk> ;
    <http://purl.org/vocab/relationship/hub> <https://mastodon.social/api/push> ;
    <http://purl.org/vocab/relationship/next> <https://mastodon.social/users/gamehawk.atom?max_id=8121439> ;
    <http://purl.org/vocab/relationship/salmon> <https://mastodon.social/api/salmon/7209> ;
    <http://purl.org/vocab/relationship/self> <https://mastodon.social/users/gamehawk.atom> ;
    a <http://schema.org/WebPage> .

Processing the (nicely fleshed out) author in the feed otherwise goes on to give us:

<https://mastodon.social/users/gamehawk> <http://purl.org/rss/1.0/email> "gamehawk@mastodon.social" ;
    <http://purl.org/rss/1.0/id> <https://mastodon.social/users/gamehawk> ;
    <http://purl.org/rss/1.0/name> "gamehawk" ;
    <http://purl.org/rss/1.0/summary> "<p>Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her.</p>" ;
    <http://purl.org/rss/1.0/uri> <https://mastodon.social/users/gamehawk> ;
    <http://purl.org/vocab/relationship/alternate> <https://mastodon.social/@gamehawk> ;
    <http://purl.org/vocab/relationship/avatar> <https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png> ;
    <http://purl.org/vocab/relationship/header> <https://files.mastodon.social/accounts/headers/000/007/209/original/media.jpg> ;
    "" <http://activitystrea.ms/schema/1.0/person>, "Karen C\U0001F968\U0001F475\U0001F3FB\U0001F332\U0001F3D6\uFE0F", "Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her.", "gamehawk", "public" .

(Something about the <activity:object-type>http://activitystrea.ms/schema/1.0/person</activity:object-type> line is confusing it there at the end, so I’ll have to track that down.)

So I have three resources going on here:

  • https://mastodon.social/users/gamehawk - Retrieving this with a browser redirects to…
  • https://mastodon.social/@gamehawk - … which has as a rel:alternate
  • https://mastodon.social/users/gamehawk.atom … which lists itself as its atom:id, the second link as its rel:alternate and the first as the author’s atom:id.

Between the redirect and the alternate Wirebird should probably figure out that these are all really the same resource, but that’s for down the road.


Comment? Email it to me. (I'll assume I can publish it unless you say otherwise)


Next post: Cleaning house: back to testability

Previous post: What I’m reading: more on identity