30 December 2010
I have been toying around for a while now with various graph databases and triplestores for working with data on intercompany relations, aimed at exploring phenomena such as the concentration of media ownership and interlocking directorates. While software such as neo4j proves fun to play with, there has been little progress to report.
The main obstacle for me seems to be the entry of such network data. There is some room for automated and semi-automated entry of graph data, e.g. a hackish mix of IPython and surf has proven adequate while entering information on separatist Flemish organizations.
It has been however difficult to adapt this approach when collecting data with a different structure. Such a home-brew solution is also lacking one or more features that facilitate data entry, such as proper auto completion, normalization, automatic coining of proper identifiers, linking with existing data, etc. This is why I turned again—I have an account since mid 2007—to Freebase.
Freebase can be described as “Wikipedia for structured data”. It is community edited, the data is released under a Creative Commons licence and it has recently been acquired by Google. It features both a nice dynamic interface for browsing and editing and a API with a set of “powertools”. Around it has sprung a community of (brilliant) people obsessed with metadata and nifty tools for working with it, such as Google Refine (née Gridworks). And it contains already tons of structured information, entered by members or pulled in from library catalogues, Wikipedia, stock tickers, etc.
This combination makes it almost enjoyable to enter data. It is actually doable to use it as some kind of “notepad knowledge base”: while you are reading up on a person or organization, make or expand the topic a little. Chances are that when you are reading up on a different subject in a few months, the previously entered subject/information will pop up and you have discovered an interesting link. This is something I have not experienced while editing Wikipedia, which is more about well-rounded articles then structured data and relations.
As an example, I focussed on boardmembers for the companies included in the BEL20 market index. I added most of the board members and their memberships, but a lot of the persons were already included in Freebase (mostly derived from Wikipedia data). We can browse the different topics, but it is also possible to query Freebase for the specific bits of information we need, and explore that in a different program.
The Python wrapper for the query API gives us convenient Python data structures to work with, and with the help of the Python graph manipulation library networkx we can turn information on board membership into a XML-file Gephi can read (see code and output). We not only select the board members of companies in the BEL20, but also those board members with whom they sit on the boards of other companies, thus providing second degree links.
With some basic layout and coloring in Gephi 0.7 we get the following graph (best viewed in full screen, use the enlarge icon in the bottom-right corner). The companies included in the BEL20 are in red and a little larger.
With the exception of the retail group Colruyt and the medical company Omega Pharma, every company in the BEL20 is interconnected. It was nice to see that media companies and holdings such as Corelio and De Eik that I entered in Freebase a while ago showed up (green nodes). Even more so was the appearance of international companies that I did not enter (blue) and that provided second and third degree connections, for instance through the board of directors of Tupperware Brands and Baxter International.
If we look at the membership graph (including persons) we see that for such a moderately sized network some people already have a decent amount of edges (four or five), e.g. Luc Bertrand (ING, Ackermans & Van Haaren, Sofinim and AXE Investments), Thomas Leysen (Synvest, Tradicor, Corelio, Umicore and UCB), Around de Pret de Calersberg (UCB, AB Inbev, Umicore and Delhaize Group) and Jean-Luc Dehaene (AB Inbev, Umicore, Dexia and Thrombogenics).
A strong cluster with multiple shared directors can be seen between Suez GDF, Groupe Bruxelle Lambert and NPM/CNP. This is not that surprising given the central role of Albert Frère in the two later enterprises and his connections with French capital. With the Fortis-debacle the link with the Belgian financial sector, represented by Maurice Lippens and Etienne Davignon, seems to have weakened.
Apart from the ease of entering data, this “serendipitous” discovery of connections with information you did not enter yourself seems to be a major advantage of using Freebase instead of a home-brew solution. It is made possible by firstly the heterogeneous but structured nature of the data Freebase contains and secondly its collaborative nature.
Regarding the first quality, we do not only have board membership information available, but for instance ownership relations between companies, familial relations between persons and memberships of business clubs, etc. Not to mention various bits of personal information—mostly gleaned from Wikipedia—such as basic biographic information, political affiliation and/or mandate, educational institution, etc. In theory every type of subject, property or relation can be entered.
Secondly, using something as Freebase for collaboratively collecting and curating data seems a nice base for conducting power structure research (PSR). As PSR seems to have experienced a decline in popularity in social sciences the last two decades (a familiar tune when looking at more “progressive” subfields), online community efforts may be well-suited to pick up the slack.
Joining up with recent movements such as the push for open (government) data, linked data and data driven journalism would certainly prove worthwhile. For inspiration of this kind of “open data power structure research” for the field of interlocking directorates, one can look at Little Sis and the recently launched Opencorporates.