I didn’t get as far into cleanup as I had hoped today - I got the basic documentation up to date, cleaned up a lot of orphaned subroutines (and entire libraries), but the only additional testing I did was to tack the Test::Pod::Coverage onto the end of the lonely initial test routine.
It’s not awful — Birdfeeder.pl is essentially a test suite of its own, lacking only the Test::More wrapper and deliberately-wrong attempts (which is a major lack, admittedly). Over the course of next week, I’ll move Birdfeeder into a test script and remedy that lack, and also move all my TODO comments into tests.
The docs are still not-great, but there’s at least a little description of each subroutine.
Not literally, just visually:
The database preloader now builds the site, subscribes to one feed (this blog, using the same call as is performed when POSTing to the SubscriptionList), and polls that feed once.
It’s still pretty hacky, but it works. Theoretically I could write a little cron job to continue to poll the feed(s), and the inbox would be continuously updated. I could subscribe to more feeds, and so forth.
I’m getting, as the current fashionable phrase says, way out over my skis when it comes to tests, documentation, and so forth, so tomorrow will be all housecleaning. Next week, all kinds of bug fixes. Here are some of the big ones.
I ran into a little complication with making new Wirebird::Resource objects that don’t appear in the main store yet, and it’s causing me to reconsider some things. First, a little background.
RDF triples can refer to URIs which are not necessarily URLs - that is, identifiers that do not necessarily locate a thing that can be served by a web server.
For instance, I might have a URI that identifies a pipe. Not a picture of a pipe, but an actual physical object. It might be a resolvable http URI that looks like an URL until you ask the server for it, at which point the server serves a 303 See Other with the URI to an image. When the browser requests that URI, it is also a URL so the server can serve a JPG located there. Or the dance might continue: the server could serve a 303 again pointing to an URL for a web page about the image.
I’ve kind of hand-waved this: “I’m building a RESTful system, so of course all my URIs need to be URLs” etc. etc. But then Wirebird::Handler::SubscriptionList accepted a URI that my browser POSTed and… now what?
It built a child Wirebird::Resource, but the only URI it had was to the Atom feed for this blog. I had been building the URL for the new resource from the URL to the blog itself - the home page, not the feed, since the feed location for a blog can change when the blog software does. But I clearly don’t have that until I have retrieved the feed, and I don’t retrieve the feed until I have a Wirebird::Resource to do it from, and if I’m now storing the feed data in an RDF store I need the subject URI. But in this case the URI is dependent on the SubscriptionList’s URI, so I can clearly not use the Wirebird::Resource that’s in front of me.
So while I was debating that, I just gave the triples the feed URL, and things are fine. So far.
<http://wirebird.com/atom.xml> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#WebLog> <http://wirebird.com/atom.xml> <http://rdfs.org/sioc/ns#name> "Wirebird" <http://wirebird.com/atom.xml> <http://rdfs.org/sioc/ns#feed> <http://wirebird.com/atom.xml> <http://wirebird.com/atom.xml> <http://rdfs.org/sioc/ns#has_container> <http://habrok:6809/users/silver/subscriptionlist> <http://wirebird.com/atom.xml> <http://rdfs.org/sioc/ns#link> <http://wirebird.com/>
Eventually I’ll want to have a URI (that is also a URL) for the local representation of the feed, because it will be much easier to issue a DELETE against that representation than to PUT an entire copy of the subscription list minus that one feed. (In fact, for a multi-user system I’ll want multiple URLs representing not the feed but a single user’s subscription to that feed.) But for the individual feed entries, maybe not - they can live in the store under their native URLs, and each subscriber’s Inbox can have a Create action with the entry as the object. So maybe my store has URIs that aren’t (local) URLs.
That said, I think the way to go here is to not have Wirebird::Resource or the handler plugins handle those non-local resources - I think I’m going to break Wirebird::Resource::pollRemote into a whole separate object. With its own plugin systems to handle all the things it might encounter in a wild URL - XML syndication feeds, RDFa-enabled web pages, JSON, HTML never designed for machine readability, media types, and so forth.
Truly, I have a dazzling intellect.
Still doing a lot of reading, but also doing a little coding.
I’ve been treating the RDF store(s) a bit like a key/value store rather than getting into any more depth, and to an extent that won’t change just yet. But I’ve been thinking about how Wirebird::Resource needs to handle its data, and I’ve made some changes.
The authoritative store is an RDF::Trine::Store::DBI (in my case, Postgres-backed, but it doesn’t really care - in fact, it could be a remote store or whatever). For performance reasons, Wirebird::Resource pulls in a memory version of that. It’s been a hashref (kind of JSON-LD) but I’m also playing around with using an RDF::Trine::Model based on a memory store. So right now it’s got all kinds of things in there since the original versions of all the Wirebird::Resource subroutines are still in Wirebird.pm waiting to be cleaned up.
The progress toward a minimal feed reader is a little slow at the moment. Technically, you can submit an url into the form, but it doesn’t get very far - it will announce in console that it’s found an XML::Feed, but doesn’t process it just yet. That’ll be the next step, along with the subroutine cleanup.
Still not a lot of code change today, again in part because of lack of time and because I’ve been doing more reading than writing. Here’s what’s been occupying me, and why.
On the one hand, SPARQL (“sparkle,” don’t @ me) is the real power that storing everything in RDF trines gives. On the other hand, there’s some tension in that I want to keep Wirebird lean enough to run in the background of a desktop machine, or on a Raspberry Pi, or even directly on a phone. Still, I want to be able to call on it for special things like search requests and such, so I’ve been reading Learning SPARQL.
I’ve also been reading Programming the Semantic Web which is a little depressing since it keeps referring to services that don’t exist anymore (it’s a 2009 edition). And all its examples are in Python, which… I really don’t want to change horses midstream, but Python just keeps coming up as a good thing to know.
Up next is Semantic Web for the Working Ontologist which I haven’t actually looked into yet, but the subtitle sounds promising.
(Those Amazon links are affiliate ones. Fun fact: I’ve had that affiliate link for ten years or so and have yet to reach the minimum for a payout.)
Another short “day,” but there is now a
instead of resources being passed around in
Wirebird.pm as various
bits and pieces at different levels of processing. I haven’t pruned
that stuff out of the original library yet, and some of the POD in the
new library doesn’t reflect the change, but it’s getting there.
When a Resource object is created, it can either have an url or a hashref passed to it. Eventually the hashref will need to be an object holding JSON-LD, RDF, or some other recognized format but right now it’s the pseudo-JSON hash that I started out using.
The Resource uses plugins in the
Wirebird::Handler:: hierarchy, of
which there is exactly one (partial) so far.
Wirebird::Handler::SubscriptionList has a priority() sub which will
check to see if the resource has a sioc:SubscriptionList type at which
point it returns a 100. Not exactly subtle. bidCreate() will,
similarly, return 100 as its bid on handling something posted to a
SubscriptionList, and create() is the handler.
Wirebird::Handler::Container exists, it will return
a lower priority() and bidCreate() when it sees a SubscriptionList.
Wirebird::Resource will process down the handlers in descending bid
order, and each bidder’s create() can look at the resource’s current
status() (which starts as 501 Not Implemented) to decide whether to do
further processing. For example, it’s fairly likely that Container will
see that the SubscriptionList handler has changed the status to a 3xx
pointing to the new resource and not do anything further… or maybe
SubscriptionList doesn’t set the has_container/container_of because it
knows Container will do it down the line.
Wirebird::Resource doesn’t do anything with something put
out to bid that still returns 501, but as the TODO says, eventually it
will be able to say “okay, according to the standard this X is a valid
resource, it points to the Y in its
foo_of attribute, and Y has a
valid inverse attribute of
has_foo that X is valid as so… save it.
(This assumes the user is authorized to POST anything to Y, of course.)
Wirebird::Handler::SubscriptionList could delegate most
stuff right back to
Wirebird::Resource. Presently, the thing it’s
doing right now that’s special is handling a bare, remote url. (I mean,
it isn’t yet, but that’s what it’s there for.)
Subscribing to something is a pretty close analog to an ActivityPub Follow, and I’m somewhat inclined to make it one. There are a couple of potential gotchas, though: an Actor’s Following list is generally public, where a person’s Atom/RSS subscriptions generally aren’t. SIOC can handle this because there is no definition of how many SubscriptionLists a user can have. ActivityPub’s spec doesn’t seem to exclude multiple outboxes, but I wouldn’t care to predict what a client’s reaction to that might be.
Both SIOC and ActivityPub are pretty clear that a person can have multiple UserAccount/Actor objects linked to them, so that’s probably where things should branch: “this is my UserAccount for feed reading; it’s not visible to the public.” Two UserAccounts can point to the same Inbox, so that would work fairly transparently to the user. For now, I’m not going to move it anywhere but I guess I’ll have to keep it in mind.
No commit today; I only have three or four hours a day, tops, to put into Wirebird and I’ve been reading Learning SPARQL rather than coding. I’ve gone back to the tests which, if you are looking at the current-as-I-write-this repo, don’t at all match the direction Wirebird.pm took. Getting those up to date is particularly important to me because there are times I don’t get to work on code at all for long stretches, and picking up on the last failing test is the only “to do list” I have found that works for me.
I also am thinking of a structural change: instead of passing around the $data (or $json as it is usually called in the code) hashref holding the current resource item, I may make it a full-blown object. Making the Wirebird object itself respresent the resource may or may not work (since sometimes it’s working with a parent and child resource or the like) so I’m thinking of a Wirebird::Resource object, with the plugin system attached to that. So I want to let that simmer a bit before moving forward with code.
As I mentioned, to start with we’ll just pre-populate the database rather than worrying about the interface up to that point.
Birdfeeder.pl is the script that will prepack the database. It doesn’t create the database; that’s a generic Postgres database that has had RDF::Trine::Store::DBI do a new() on. Which, yeah, should really be in either the script or in Wirebird.pm itself.
Anyway, it’s a hacky little script that does some very, very minimal error catching and debugging, but until there’s real validation happening inside Wirebird.pm it really doesn’t matter.
Birdfeeder makes a site root page, a usergroup page, a user, an inbox, an outbox, and a subscriptionlist. Since I don’t have a front end going, the templates for those page types have some generic html forms on them. All we really care about right now is that the subscription page has a form where you can post an url to the subscription page. (The form could be anywhere, but KISS for now.)
The templates are pretty utilitarian, designed to give me as much debug info as possible rather than to look like the finished project. Over in the sidebar is the JSON representation of the data. Here’s the site root:
Again, for a single-user site this would probably be a control panel, or maybe even just the Inbox directly. This installation has auto-populated itself from the machine name (habrok) and my username (silver).
A usergroup of one.
Birdfeeder just directly calls the putPage() function, which gets called (after some authen and preliminary authz) by the Dancer app. If the validation was all in place, this would literally be all that’s required for the minimum viability I described in the first post: it’s a server that servers and accepts structured JSON (only, so far) RESTfully. Granted, this offloads a lot onto the as-yet-imaginary client, but there are standards, and they work.
(I mean, if they’re properly implemented they work. Currently Wirebird
cheerfully responds to that form submission with a simple
obviously I should blog less, debug more.)
Influenced by Jacky’s public development, I’m throwing my own working-in-progress out there. Not livestreaming my coding, though, nobody wants to see that.
I’ll be putting a new repo up on Gitlab real soon now. So far it’s not much, though I have a lot of other code waiting to get converted over from JSON files to RDF database store.
What it does right now:
Wirebird.pm is the library, and is mostly a
conglomeration of helper functions and syntactic sugar, over a
Postgres-based RDF::Trine store.
(When Attean gets a little more
mature, I’ll probably switch over.)
Although RDF can store some complicated structures, nothing Wirebird builds so far is more complicated than what can be described in basic JSON-LD. So for speed, most of the HTML and whatnot is built and served by slurping up the static JSON file.
The library is doing some basic RESTful verb handling. The framework for validating incoming data is in place but not doing anything yet (the inline comments address some of this). As I described in my first post on the subject the nature of REST+structured data means the generic handler can just Do The Right Thing without special custom coding, at least within a user-controlled space.
For times when a little customization is needed, such as when an
authenticated user is trying to POST something outside the user’s
directory, the library has a plugin system. Each plugin will have a
subroutine where it examines the JSON file for that url and comes back
with a priority bid. So maybe you post a
sioc:Weblog to a
sioc:SubscriptionList. The plugin for generically posting
sioc:Container throws a lowball bid, but the one specifically for
Weblogs to SubscriptionLists says “YES THAT IS MINE!”. Same goes for
rendering - generically the library looks for a template that matches
@type of the data, but a plugin could easily say “You know what,
this is an Inbox and not just any old
a custom template.”
Anyway. Not much of this is specifically relevant to the first step but you know, gotta build on a solid foundation.
Rubberducking ahead! Mostly I’m recording my thought process so I don’t reinvent the wheel later.
The first decisions that need to be made here are about file structure.
The root will be a
sioc:Site (and also a
single-user sites this will probably display recent blog posts or
whatever in the HTML, but the linked data will be the site info.
Three (four, if you count HEAD) HTTP verbs are legal, auth permitting. * GET/HEAD - Self-evident, and generally permitted regardless of authentication. * PUT - Will update the non-list site attributes. * POST - Will add list-type site attributes. * DELETE - Will delete the resource. (Pretty rare at the site level, hopefully).
Four attributes are lists of items:
space_of: Any resource. That doesn’t narrow things down much, but a
sioc:Space(which a Site is a specific type of) can hold… anything.
sioc:UserGroup. A UserGroup is just a group of members; literally the only unique attribute it has is
has_member. This is also inherited from Space.
sioc:UserAccount. This is a Site-specific attribute, and conceivably a single-user site could have no UserGroup, just a direct administrator.
sioc:Container. A Container is anything that contains other things, so that would be a Blog or Forum. (Why a UserGroup isn’t just another Container I do not know.)
Most random resources POSTed here are going to bounce with a 501 Not Implemented. I’m sure things will come up down the line that just need posted to the site generically, but I can’t think of any offhand.
I am inclined to say a Wirebird will have a default UserGroup at
/users/, but if not the client will have to POST one before
UserAccounts can be created.
POSTing a UserAccount directly to the Site will (auth permitting) promote that UserAccount to administrator status, creating it if necessary (and adding it to the default UserGroup).
Containers being POSTed here will be sitewide forums and such. For reading feeds, we won’t need those yet.
HTTP verbs will look like the Site ones.
UserGroups only have one list-type attibute of interest:
Having urls built from usernames bring up the (well, an) endless REST
debate: since the client is (probably) choosing the username, should it
just build the url and PUT the data directly to
whatever? I think I’m going with the former.
/@username may be aliases to the same thing, we’ll see.
SIOC doesn’t have users, just accounts. (Actual humans get to be
UserAccounts can have all sorts of attributes, but for our purposes we will only be concerned with:
sioc:Container. For the most part, Containers POSTed here will be already-existing things, but we’ll allow one type (so far) to be created by POST: a SubscriptionList.
Because we’re also going to be ActivityPub-compatible, a UserAccount
will also be a
stream:Actor. This means it needs an Inbox and Outbox,
Despite being a specific subtype, SubscriptionLists don’t have any different attributes than base Containers. Here again, the attribute we’re most interested in is:
This is where things get a little hairy. The Item is going to be a (representation of a) Weblog - does it live under the SubscriptionList? How about the Weblog’s Posts? For a single-user site, it doesn’t really matter, but for a multi-user site should there be a shared area? Would doing so be potential exposure of something (even if we restrict GET to subscribers)?
Our urls can get kind of long:
Or I could put it all in a shared space, which could reduce them to:
I know RESTful urls aren’t supposed to matter, but I think I’ll go with the latter form. The cache directory can itself be a Container, of course.
This seems like a log of work to get all the way down to subscribing, which is as simple as… POSTing an url to the SubscriptionList. But for minimum-viable, most of the above stuff will get initialized on database creation and won’t need to be edited.
When the url gets POSTed, we hand things off to XML::Feed. If the target isn’t a feed itself, XML::Feed includes a find_feeds function. It also doesn’t care if it’s RSS or Atom; the interface is the same.
Once we have the feed parsed, it’s pretty simple to build the Item:
whatever url find_feeds found goes in
sioc:feed, $feed->title goes in
sioc:name, $feed->link goes in
The polling process is pretty simple: retrieve all the
subscriber_of and their
use XML::Feed to get the current feed for each
Store each item in the cache, and put a link in each
This last bit gets a little sticky, since ActivityPub expects Inbox
entries to be wrapped in an Action, but luckily it will assume a Create
if it’s given just a naked Object. So each feed entry will be saved as
sioc:Post and a
streams:Article, and eventually when we have an
ActivityPub API we can read the Inbox directly that way.
There is one thing that’s iffy about both RSS and Atom feeds, and that’s the unpredictability of what you end up with in the “author” fields. You can have authors attached to the individual entries or the overall feed, and the authors can be a plaintext string, an url, an email address, or various combinations thereof. And since the point of Wirebird is to be a social network, that’s not good. We may not care about reliably identifying the author of news articles, but if I’m reading a friend’s blog I want to be able to link that data properly.
But that’s for down the road.