After a brief battle with installation*, I managed to get RDF::RDFa::Parser installed in the hopes of pulling OpenGraph information from my current guinea-pig page. Sadly, it doesn’t pull any graph information from that page, despite mentioning OpenGraph in the documentation. But it does turn the Atom feed into a whole bunch of triples.
And that’s maybe worth considering. XML::Feed can, if you look at the underlying XML::Atom::Person object, pull exactly three pieces of information about the author: name, uri, and email address. Mastodon gives the following profile:
<author> <id>https://mastodon.social/users/gamehawk</id> <activity:object-type>http://activitystrea.ms/schema/1.0/person</activity:object-type> <uri>https://mastodon.social/users/gamehawk</uri> <name>gamehawk</name> <email>firstname.lastname@example.org</email> <summary type="html"><p>Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her.</p></summary> <link rel="alternate" type="text/html" href="https://mastodon.social/@gamehawk"/> <link rel="avatar" type="image/png" media:width="120" media:height="120" href="https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png"/> <link rel="header" type="image/jpeg" media:width="700" media:height="335" href="https://files.mastodon.social/accounts/headers/000/007/209/original/media.jpg"/> <poco:preferredUsername>gamehawk</poco:preferredUsername> <poco:displayName>Karen C🥨👵🏻🌲🏖️</poco:displayName> <poco:note>Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her.</poco:note> <mastodon:scope>public</mastodon:scope> </author>
That’s the exception more then the rule (I think) and of course Masto also presents this (and more) via its own API and its ActivityStreams profile, but if the information is there Wirebird probably ought to capture it.
And that kind of leads me to consider whether WB should be grabbing everything it can find about a profile, rather than picking the top source. I’ve left that possibility open (the bidding process returns all the bids, not just the top one) though it offers complications. RDF triples are inherently better at saving lots of values for a thing, rather than overwriting the value. This can be good or bad; if the display name, for instance, is not a 100% match you can end up with all the variants being saved and then having to decide which one is “best.” It also has to be smart about things: if one source gives a 200x200 avatar, while another gives 120x120 but also explicitly lists height/width, Wirebird has to be smart enough to know those things are related and that it shouldn’t override the 120x120 with the “better” 200x200 without also changing the stored height/width - more special cases.
Going back to the HTML profile page, there’s a mishmash of information available. Let’s assume that instead of subscribing directly to the Atom/RSS feeds, Alyssa has pointed her wirebird at this profile page.
<title>Karen C🥨👵🏻🌲🏖️ (@email@example.com) - Mastodon</title>
Wirebird::Remote::HTML, the fallback for web pages that have nothing else available, will decide this is the profile’s name. Since it’s likely to involve the sitename being appended (and in this case does) I am thinking that the rule will be “parse only the sources over a minimum bid value, unless there are no bids higher than that value.”
<meta content='4.36K Toots, 277 Following, 440 Followers · Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her.' name='description'>
Here’s another example of variances in a “standard” field - Masto adds the statistics to the front of the bio - not just an addition, but one that’s going to continuously change. Hmm.
<link href='https://mastodon.social/api/salmon/7209' rel='salmon'>
Salmon endpoint. This one’s also available in webfinger data.
<link href='https://mastodon.social/users/gamehawk.rss' rel='alternate' type='application/rss+xml'> <link href='https://mastodon.social/users/gamehawk.atom' rel='alternate' type='application/atom+xml'>
A pointer back to the feeds; Wirebird will have to be careful about looping. For now we’ll add them to the queue.
<link href='https://mastodon.social/users/gamehawk' rel='alternate' type='application/activity+json'>
The ActivityStreams/Pub profile, very handy, also queued.
<meta content="profile" property="og:type" /> <meta content="https://mastodon.social/@gamehawk" property="og:url" /> <meta content="Mastodon" property="og:site_name" /> <meta content="Karen C🥨👵🏻🌲🏖️ (@firstname.lastname@example.org)" property="og:title" /> <meta content="4.36K Toots, 277 Following, 440 Followers · Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her." property="og:description" /> <meta content="https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png" property="og:image" /> <meta content="120" property="og:image:width" /> <meta content="120" property="og:image:height" />
The OpenGraph data. There are a lot of Perl libraries for producing these, but not so many for parsing them. I guess these means I should be making Wirebird::Remote plugins more generic.
<meta content="summary" property="twitter:card" />
And the Twitter card? I feel like something is missing here.
<meta content="email@example.com" property="profile:username" />
I don’t recognize the profile: prefix offhand (and neither does prefix.cc), but I guess it’s fairly self-explanatory. It’s a version of the username that includes the domain name, it’s maybe worth noting.
Okay, Wirebird goes to its queue, and looks at the rss feed.
<title>Karen C🥨👵🏻🌲🏖️ (@firstname.lastname@example.org)</title> <description>4.34K Toots, 277 Following, 440 Followers · Wandering ex-Jayhawker (not the same as a Jayhawk, but close), currently in Jersey (Philly area). Freelance Perl coder. She/her.</description> <webfeeds:logo>https://mastodon.social/packs/logo-fe5141d38a25f50068b4c69b77ca1ec8.svg</webfeeds:logo> <webfeeds:accentColor>2b90d9</webfeeds:accentColor> <image> <url>https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png</url> <title></title> <link></link> </image> <webfeeds:icon>https://files.mastodon.social/accounts/avatars/000/007/209/original/1a8cc570acbd05ea.png</webfeeds:icon> <webfeeds:cover image="https://files.mastodon.social/accounts/headers/000/007/209/original/media.jpg"/>
Not much we don’t already know… except the number of Toots (statuses) has changed, so technically the bio has changed. I pulled these files on different days, but this could happen even during a single session.
This also adds a cover image, and a logo, the latter of which doesn’t actually seem to be defined in the linked webfeeds definition. It’s also the site logo, and not my (nonexistent) Personal Brand(tm), but there’s no real way to tell the difference.
Back to the queue, and the Atom feed listed at the top of the entry. Not much new info there either, other than some better-defined fields. (Aside: it might be interesting to have Wirebird reverse-engineer owl:sameAs properties based on redundant data like this.) Interesting, Masto claims that email@example.com is my email address, but sending email doesn’t work (of course I tried). We finally get to see the unadulterated bio in the form of a poco:note.
Portable Content, by the way, is a problematic standard; the original creators let the domain go and now claim it’s been cloned by its new owner for nefarious purposes. To an extent it doesn’t matter because for performance reasons everyone maintains a cached version of any machine-readable standards they use - or more often just hard-codes everything.
Onward to the Activity json. It’s too long to embed here, but there is plenty of new information: links to follower/following lists, inboxes, outboxes… and a public key. Wirebird (with no special-case handling for Activity yet) doesn’t know what to do with the links since they’re not “alternate” types, so they don’t go on the queue, just in the stash of Things We Know About People.
And that’s a fun thing that machine-readable standards can do: a version of Wirebird may not know what something is, but it knows enough to store it correctly. A plugin or future version may come along and say “ah ha, I know what a salmon endpoint is, now I can allow my user to comment on more things!”
At any rate, now Alyssa knows a lot about me. And this is a process that I should use for building the base profile, because my profile on the card I posted a week ago is pretty sparse (though at least it’s added a “Member Since” date, still in ISO format). It may scare people to see Wirebird build a shadow profile for them, though, and I’m not sure if that’s good or bad. It’s all public information, after all.
sudo cpan install RDF::RDFa::Parserchokes on a slightly-obscured error due to the absence of Module::Package. Module::Package’s install chokes due to… the same error. There’s a year-old open issue (and an offered patch) related to the fact that current Perls no longer start @INC with dot. This was a little distressing - I figure it’s a signal that new installs are no longer being made when installer packages start to fail and not be fixed. But eventually I realized that
cpanmworks fine, and I’m probably just a dinosaur for not using it.
Also I guess since even I didn’t run across the problem until I tried to install a not-entirely-finished module from 2012, it’s not that significant.