Some of my favorites on the new joost:
- Pink - So What
- Green Day - Boulevard of Broken Dreams
- Madonna - Give It 2 Me
- Beyonce & Shakira - Beautiful Liar
- Tooth & Claw Episode 1: WALRUSES
Let us know what you think of the new joost!
Some of my favorites on the new joost:
Let us know what you think of the new joost!
Look what I found on joost (disclaimer: I work there) hidden away in the cooking category:
oh, and once you’re past that first hangover…
Joost has a large library of video clips. For a while Joost was my primary source of music during the day, but I tend to use last.fm quite a bit these days, and now I only turn Joost on when I don’t mind being distracted by Nelly Furtado’s belly button.
The last.fm feature I like most is the music profile sharing. I’ve learned about so much cool new music just by looking at what my friends and colleagues listen to. Fortunately, Joost is now growing some social networking features of its own - these days if you click on a video on our website, we can send a notification to your facebook feed. I wonder what will happen now that everyone on facebook will know that I actually enjoy watching Rihanna and Fergie…
So, at work, we’re doing some “next generation” versions of a bunch of our backoffice tooling. That involves producing a bunch of cute little web applications, that often control not so cute and not so little processes (like transcoding and publishing and whatnot). The course-grained architecture pattern is pretty simple and familiar: database with information about files, jobs, tasks and metadata, some common libraries for interacting with the database, some web application middleware using those libraries, and a web server frontend serving up the middleware.
Pretty much normal bread-and-butter stuff. It’s not quite like document-based CMS work (you don’t really want to store many-gigabyte video in a JCR repo), but a lot of the technology choices are still similar.
Based on the various tech we have deployed today, and the skills of the people working on this kind of thing, we’re trying to standardize around two main server-side technologies: java and python. This post explains the choices we made for the python universe. At the moment those choices are actually not so easy, since there’s so much happening and so many projects are moving so fast. We scouted the web quite a bit to figure out what to do.
/etc/init.d, logs in all the right places, etc) behind an apache httpd 2.2 ProxyPass, which also handles SSL/AAA. We want to try and move to mod_wsgi but first we need its mac install to suck a bit less, and so far, cherry is not quite falling over on us. If mod_wsgi doesn’t work out it’ll probably be back to twisted, probably also behind apache for SSL reasons.We took a look at a bunch of the web frameworks out there. We didn’t seriously consider zope, but we took a long stare at pylons, turbogears and django before deciding not to bother with them. We’re not using much of paste either. Basically we missed one or more of
And perhaps a few other things, and on the balance we guessed it would be easier to roll our own and integrate components, rather than strip something else down, and maintain lots of vendor branches.
Two years ago I would’ve picked twisted without blinking and invented another fancy wheel on top of it, but I’m happy I don’t have to do that anymore. Twisted has quite a learning curve, not just for app developers, but also for the people that need to deploy and scale the beast.
Two good things happened to the python webapp world: competition and standardization. Now things are progressing rapidly.
Progress is good, but it can result in various kinds of chaos that don’t help the application developer that likes to plan ahead a bit. The new scripting language based mega frameworks seem to attract a certain kind of developer and they probably work for a certain set of use cases, but standardizing on patterns and interfaces is much more useful for (opinioned!) people like us (with subtly deviating use cases). So framework authors: please do keep working on bridging the gap between all of them by cutting ‘em down into tiny little WSGI middleware bits and pieces, and turn frameworks into libraries where you can.
As I mentioned in a previous entry, Joost™ uses quite a bit of RDF. I’m sorry, but I’m not going to share our full data model with you (though we might do that in the future). All I want to try and do is highlight some basic choices that we (mostly Alberto) have made on how to model things using RDF.
Choice number one:
No bnodes
A blank node (definition) or “bnode” for short is when you have a subject in RDF that doesn’t have a ‘real’ URI.
You encounter bnodes when modelling things in a ‘normal’ object-oriented fashion, and especially a lot in ‘normal’ modern XML. For example, the XML document
<!--[CDATA[ ]]-->
might be turned into RDF as
<!--[CDATA[@prefix : .
leo isA Person ;
name "Leo Simons" .
alberto isA Person ;
name "Alberto Reggiori" .
foo isA Group ;
name "Mentioned in article" ;
containsPeople ( leo alberto ) .]]-->
which is a special Notation3 (or Turtle) shorthand for
<!--[CDATA[@prefix : .
leo isA Person ;
name "Leo Simons" .
alberto isA Person ;
name "Alberto Reggiori" .
foo isA Group ;
name "Mentioned in article" ;
containsPeople _:1 .
_:1 rdf:first leo ;
rdf:next _:2 .
_:2 rdf:first alberto ;
rdf:next rdf:nil .]]-->
_:1 and _:2 are bnodes. Doesn’t seem to be a problem with this, does there? (aside from rdf collections being cumbersome)
Well, consider this alternative:
<!--[CDATA[@prefix : .
leo isA Person ;
name "Leo Simons" .
alberto isA Person ;
name "Alberto Reggiori" .
foo isA Group ;
name "Mentioned in article" ;
containsPerson leo ;
containsPerson alberto .]]-->
It consists of less triples, obviously meaning less storage space, and, given the nature of RDF databases today, also better performance. As a data model grows in complexity, it seems that the percentage of bnodes will normally grow a bit as well, so the effect is more pronounced for lots of data.
But, more importantly, the software you have to write becomes more involved. Let’s investigate.
Here’s some imaginary java code (using jena) that prints certain data it finds in the model:
List names = new ArrayList();
for(RDFNode nameNode : m.listObjectsOfProperty(r, Example.name)) {
if(!nameNode.isLiteral()) {
continue;
}
Literal nameLiteral = (Literal)node.as(Literal.class);
try {
names.add(nameLiteral.getString());
} catch(DatatypeFormatException e) {
}
}
returns names.toArray(new String[names.size()]);
}private boolean hasType(Model m, Resource toCheck, Resource expectedType) {
return m.contains(m, toCheck, RDF.isA, expectedType);
}
private void printHeader(Model m, Resource groupResource) {
String[] groupName = getName(m, groupResource);
if(groupName.length > 0) {
for(String name : groupName.length) {
System.out.println("Group name: " + name.getString());
}
} else {
System.out.println("Group name: ");
}
System.out.println("----");
}
private void printName(Model m, Resource personResource) {
String[] personName = getName(m, personResource);
if(personName.length > 0) {
for(String name : personName.length) {
System.out.println(" " + personName.getString());
}
} else {
System.out.println(" ");
}
}
public void printInfoAboutGroup(Model m, URI groupId) {
ValidationUtil.checkNotNull(m, "m");
ValidationUtil.checkNotNull(groupId, "groupId");
Resource groupResource = m.getResource(groupId.toString());
if(groupResource == null) {
System.err.println("Warning: no such group: " + groupId.toString());
return;
}
if(!hasType(m, groupResource, Example.Group)) {
System.err.println("Warning: not typed as a group: " + groupId.toString());
}
printHeader(m, groupResource);
//
// NOTE: for loop in a for loop in a for loop
//
for(RDFNode groupHead : m.listObjectsOfProperty(groupResource, Example.containsPeople) {
if(!groupHead.canAs(Container.class)) {
System.err.println("Warning: Group "+groupId.toString()+" containsPeople points to a non-container");
continue;
}
Container container = groupHead.as(Container.class);
for(RDFNode peopleNode : container.iterator()) {
if(!peopleNode.isResource()) {
System.err.println("Warning: Group "+groupId.toString()+" rdf:next points to a literal: " +
((Literal)peopleNode.as(Literal.class)).getLexicalForm();
continue;
}
Resource peopleResource = (Resource)peopleNode.as(Resource.class);
if(!hasType(m, peopleResource, Example.Person)) {
String identifier = (peopleResource.isAnon())?
peopleResource.getId().getLabelString() :
peopleResource.getURI();
System.err.println("Warning: not typed as a person: " + identifier);
}
printName(peopleResource);
}
}
}
...]]>
Here’s the printInfoAboutGroup() method again, now for the RDF model structure from listing 4.
I suspect that, if you haven’t seen RDF-inspecting source code before, all of the above looks a little scary. There’s a load of looping and checking that you don’t have to take into account when using simple javabeans. This is the price to pay for the open world assumption, though of course a lot of it can be abstracted out in utility code a lot better than I’ve done above.
However, in the midst of all that java fluff, the difference should still be clear — Listing 6 has one nested for loop less than Listing 5. No matter how much you clean up this code, that fundamental difference remains, and, because of the open world assumption, it is rather more important…compare…
Because of the open-world assumption, making use of bnodes is very expensive when doing real-world software development. Therefore, bnodes should be avoided. Compare:
foo.bar.getXyz() vs. foo.getBarXyz().foo.getElementsByTagName("bar").getAttribute("xyz") vs. foo.getAttribute("barXyz")for(foo) { for(bar) { for(xyz) { ... }}} vs. for(foo) { for(barXyz) { ... }}.You can forget all of the above, just remember these rules:
Startups suck. Not suck as in suck ass, but suck as in suck you in. It’s an interesting experience.
I work at a company that somewhat resembles a startup. I’ve worked there (as an employee, that is) for 3 months now. Like many of my collegues, I tend to work 60-hour weeks.
Before that, I was self-employed, and I worked 60-hour weeks, too, and I dare say with more stress. However, somehow I always managed to find some time to keep somewhat involved with some amount of open source stuff on the side, whereas now, I don’t even seem to find the time to read the members@apache.org mailing list.
It’s not because we don’t use open source software at work:
$ pwd /Users/lsimons/dev2/tvp/bistro-trunk/external-libs $ find . -name '*.jar' | grep -v '.svn' | wc -l 294
It’s also not because my employer doesn’t want me working on open source projects (quote: “if this thing starts taking up more than 30% of your time we should sort-of talk about it”).
It’s about the rhythm. Just about everyone in our little p2p video company works like crazy, yet parties like students (actually, the students tend to leave the bars way early in Leiden, whereas we often get kicked out). In many ways, coming into the office (which I tend to do for about two days a week, the rest I work from home) feels more like arriving at a mini-conference than like, ehm, coming into the office. It just sucks me in, and I like it that way.
What I do these days? Judging by flickr, crowd control, ranting, eating, drinking, and making a fool of myself (and others). I don’t think there’s a public picture to be found on flickr yet of me actually working. Hmm….
I guess I should be happy I don’t work at youtube — then the above links would’ve been to the even more embarassing christmas dinner video footage, that, as far as I know, hasn’t made it off of our SSL-secured intranet. Of course, if we do our job really well on the COW (short for Content Owner Website, looks to be the main project for me to work on in January), we might be seeing a video channel on TVP about TVP before I can come up with a ploy to prevent it…
PS: you’ll be missed, Mads. I’m sure you find the job you’re looking for soon.
So, the secrecy is slowly being dropped.
As should be obvious by the sudden influx of people blogging about our little project, the company blogging policy was made available. Such. A. Relief. I hate secrecy. There’s only a few things (that interest me) which I can’t talk about, apparently. Yay!
So let’s mention something which might be interesting to my audience.
We make extensive use of RDF in different places. It all starts with a core RDFS/Owl schema that is used to capture various kinds of information (think FOAF +imdb+RSS+a lot more). I suspect some parts of the modelling work that was done here will make it into future standards for online video.
We have a custom distributed digital asset management system (or DAM), built around jena-with-postgres at the moment for storage and (CRUD-like) management off all that RDF-ized information over a REST protcol.
We convert from RDF to different specialized XML formats and back again. We convert from RDF to excel spreadsheets and back again (ugh). We have our jira instance hooked up to our RDF store. We convert RDF to other kinds of RDF. We have custom RDF visualization tools. We have custom RDF store crawlers that do efficient validation. We have RDF schemas that control the behavior of other distributed systems by adding intelligence to the core schema. We do triple timestamping. We do intelligent schema-driven indexing. We have custom libraries to make doing wicket-based, RDF-based web application development easier. Oh, we do RDF-based web applications. In short, we do more RDF than you can shake a stick at. So not a day goes by without some of our developers swearing about “RDF” or “metadata”, since in many ways RDF still isn’t exactly mature technology. But we’ll fix the warts, and contribute those fixes back to the open source community.
In many ways, to me, the RDF part of our server architecture is much like WADI (I spent a year building the next version of it with asemantics before joining The Venice Project), with postgres instead of oracle, REST instead of SOAP, and a much less scary data model.
This RDF-based digital asset management (or DAM) seems like something everyone’s doing right now. For example, Sesame has HTTPSail now, and the Simile people have Semantic Bank (I know of several more examples I’m not sure I can mention here).
Since everyone is inventing roughly the same wheel at the same time, and some people have re-invented it several times now, it is obvious it is about time for an open source project that does RDF-over-HTTP, properly. I’ve been talking to various people about this for a while now, and a bunch of us are almost ready to approach the Apache Incubator with a proposal for a project to build a “sparql endpoint”. And the venice project will be donating some code (and developer time!) to seed this effort. Hopefully we will go from annoyingly secretive to actively open (and open source) in the scope of a few weeks.
Now, back to work. Or rather, lunch, in London, with some people on our content team. See if the sushi is even better here than it was in New York.