Monthly Archives: August 2013

Getting started with Gephi

For this post, I’m going to provide step-by-step instruction for those of you interested in creating network graphs using Gephi.  Certainly there is other open-source software available for visualizing social network and textual data such as Pajek (this website could use a serious design update), but at the time of this post, Gephi 0.8.2-beta has some significant advantages.  Software such as Pajek allows you to save your project file as a .bmp, .png, or .svg, but Gephi allows you to save your graph image as a .pdf.

Additionally, Gephi’s most significant advantage over the competition comes from the inclusion of the sigma.js plugin, which uses the HTML canvas element to display static graphs like those generated in Gephi.  This is a massive leap forward for sharing graphs generated in Gephi, as now they can be uploaded directly to your server/website using an FTP file manager such as FileZilla.  To interact with the graph rather than view it as a static image used to require downloading the specific, proprietary program file containing the graph from its creator, then downloading the specific software to open the file.  However, with the sigma.js plugin, interactive graphs can be displayed and shared instantly via a simple web address.

To begin the process of creating and sharing your own network graph using Gephi, I’ll break the process into a series of simple steps.  I think these instructions will be useful to those of you starting out, as a simple, step-by-step “Gephi for Dummies” manual simply doesn’t exist at the time of this post, something that I wish I had when I first started working with the software.  Gephi has a “Quick Start” guide here which is worth a look, but it leaves much to be desired as a basic guide for a novice user.  The following instructions which I’ve created owe much to the wisdom and experience of Jason Heppler and Rebecca Wingo, graduate school colleagues who provided a lot of assistance during my trial and error process of figuring out Gephi’s software.

1. Download and install Gephi from their website.

2. To download and install the Sigma.js plugin, open Gephi.  Click on the “Tools” tab and click “plugins” from the drop-down menu.

3. Click on the “Available Plugins” tab and scroll down nearly to the bottom of the list to find the “Sigma Exporter” plugin.

4. Click the check box next to the “Sigma Exporter” plugin, then click the “Install” button at the bottom left corner of the window.  (See screenshot below.)

sigma screenshot

5. Once the plugin is downloaded and installed, close and re-open Gephi to complete the plugin installation.

6. Now it’s time to format your data for importation into Gephi. Using Microsoft Excel, create a two column data set.  The specific format for the data needs to be divided into one column as SOURCE and the second column as TARGET.  (See example screenshot below).

excel screenshot

7. The SOURCE column on the left determines the number of nodes your graph will contain and the TARGET column will determine the number of edges (or connections) between nodes.  Repeat the node-edge/source-target pattern in these two columns for each connection between nodes you wish to visualize.

8. Once you’ve entered in (or hopefully imported) your data and saved it as a .csv file, you’re ready to import the file into Gephi.

9. Open Gephi, click the “File” tab, then click “Open” from the drop-down menu.  Browse for your .csv file, and click the open button at the bottom of the window.

10. This will open the “Import Report” window.  Make sure the “Create Missing Nodes” box is clicked, then hit OK.  (See screenshot below.)

import screenshot

11. To label your nodes in the graph, click on the “Data Laboratory” tab.

12. At the bottom of the screen, click the “Copy data to other column” button, then select “ID” from the drop-down menu.  (See screenshot below.)

labelid screenshot

13. In the pop-up box, select “Label,” then OK. (See screenshot below.)

label screenshot

14. Now click on the “Overview” tab to tinker with your graph’s spatial/visual layout.

15. Click on the “Choose a layout” tab on the lower-left part of the window to determine how you want to display your nodes and edges.  A popular choice is either of the “ForceAtlas” templates, but I’d recommend tweaking the value of the “gravity” input to expand/contract the spread of your nodes to your liking.

16. At this point, you should be able to see your network graph on the “Overview” page and can choose to export your graph as a .pdf or as a Sigma.js template.  Click the “File” tab, scroll down to “Export” and select your preferred format for exportation.  If you choose to export using the Sigma.js template, Gephi will create a folder containing files ready to be uploaded to your server/webpage.  If you want to view the graph in a browser, click on the “index” file in the folder.

17. (ADVANCED) For those of you who wish to further tweak your visualization, you can customize your node colors and sizes by linking them to particular data attributes.

18. This is accomplished using the statistical analysis options on the right side of the “Overview” interface. (See screenshot below.)

analytics screenshot

19. To color code your nodes by cluster and to emphasize the significance of particular nodes via node size, you’ll need to run at least one of several statistical analyses, which are located on the right side of the screen.

20. I found that running “Avg. Path Length” under the “Edge Overview” tab to be particularly helpful in visualizing relationships between nodes.  This statistical analysis generates new metrics such as “betweenness centrality,” which, when tied to node size, visually emphasizes the more significant nodes in the graph.  (See screenshot below.)

betweeness screenshot

21. To link “betweenness centrality” to node size, click the red jewel icon on the upper-left-hand side which selects the size/weight attribute for each node, then click the “Choose a rank parameter” tab on the upper-left side and select “betweenness centrality” as the attribute you wish to link to size/weight.  (See screenshot below.)

sizeweight screenshot

22. You can link any of the attributes from the “Choose a rank parameter” tab to node color, size/weight, and labels, so there’s a great deal of customization options available to you.

23. Once you’ve established the link between your nodes and your attribute of choice, you can adjust the min/max size and color of your nodes to further customize your graph.

24.  This is as far as I’ve gone with Gephi, so I’ll end here with a couple final troubleshooting tips after you’ve exported using the Sigma.js template.

TROUBLESHOOTING:
One issue I had during my first project was that in Overview, my labels and custom node sizes showed up fine, but upon exporting the graph, all the nodes were the same, tiny size and had no labels.

To fix this particular issue and/or to customize which nodes are labelled, you’ll need to open the folder created by the Sigma.js template and then open the “config.json” file with Wordpad or an .xml editor.  (See screenshot below.)

config screenshot

Once you’ve got the file open, scroll down to the lines “maxNodeSize” and “minNodeSize,” change the values until your nodes are large enough, save the file, and re-open the “index” file in your browser.  (See screenshot below.)

wordpad screenshot

To fix the label issue, lower the “labelThreshold” value in the config.json file to assign labels only to nodes equal to or above a certain weight.  Again, save the file once these changes are made, and re-open your “index” file in your browser to view the results.

My first project using Gephi resulted in a network graph created using 33 nodes and 213 edges.  This was just a test run using the names of several prominent political figures from twentieth-century Mexican history, so my data set doesn’t actually analyze anything (however, I have plans, big plans for the near future.)

photo screenshot

This screenshot is equivalent to the level of interactivity you can gain from viewing the graph as a .pdf, a static image.  However, with the aid of the Gephi.js plugin, you can view a fully interactive version of the graph here, complete with clickable nodes containing a variety of attribute data.

I hope you find this tutorial useful, and I look forward to seeing your future projects using Gephi.  Stay tuned for more mapping and network graph projects I’m churning out this fall, and feel free to contact me at r.jordan@colostate.edu if you have any questions.

6 Comments

Filed under data visualization, digital humanities, gephi, history, mexican history, network graph, textual analysis

Public Market Construction in Mexico City, 1953-1964

The next step in my mapmaking project on Mexico City during Uruchurtu’s tenure as regent from 1952-1966 was the addition a new data set focused on the construction of new public markets not only within the confines of the Federal District, but within the greater Mexico City metropolitan area as well.  Before examining the processes by which I created this latest layer of data, a bit of historical background should be helpful.  From 1953-1966, Uruchurtu and the DDF were responsible for the construction of 172 new markets containing over 52,000 individual vendor stalls at an estimated cost of more than half a billion pesos.  Precise numbers for construction, renovation, and maintenance costs are very difficult to obtain from existing archival sources.  However, DDF records do show that from 1953-1958, the city spent 350 million pesos renovating existing markets or constructing new ones, representing almost 8.5 percent of the total expenditures by the DDF during this period.  The financial gains by the modernization of commerce in these new markets were insignificant, and such massive expenditures for the city treasury instead served as state propaganda and a guarantee of support from a new political interest group composed of comerciantes en pequeño (petty merchants).

This newly formed economic and political covenant with the Frente Unico de Locatarios y Comerciantes en Pequeño del D.F. would help to reverse the PRI’s political fortunes in Mexico City, helping to boost support for the ruling party among the working classes.  The openings of major markets such as La Merced, Jamaica, and Tepito were highly touted political events showcasing the commitment of the Revolution to bringing about economic equality for all.  In October 1957, the opening of a massive market containing 4,488 vendor stalls in the barrio of Tepito was attended by thousands of vendors, members of Congress, senior ministers, Uruchurtu and department chiefs within the DDF, and even President Adolfo Ruiz Cortines, reportedly the first Mexican president to ever step foot, let alone hold a major political event, in this neighborhood notorious for its poverty and street crime.

However, beyond the political gains which resulted from the construction of new, modern marketplaces, Uruchurtu sought to eliminate the disease, crime, and immorality which city officials associated with the “market days” or tianguis which had been a part of Mexico City’s economic and social traditions since the time of the Aztecs. These chaotic, unregulated markets were portrayed as being rife with pickpockets, dealt in black market goods, sickened residents through the sale of spoiled food, and corrupted the morality of the populace through the sale of cheap alcoholic beverages.  The commercial activity created by these marketplaces spilled out of plazas into the surrounding streets of the city.  Pedestrians and large trucks continually flowed past sidewalks filled with vendors’ stalls, slowing traffic to a crawl.

This “cork” on the flow of buses, sanitation crews, and general commerce reportedly affected more than 530,000 square meters of the urban landscape, and according to the DDF, was analogous to an “ever-growing cancer” on the city.  The destruction of these old markets and the containment of petty merchants within new, modern market buildings was a priority for the well-being of the city and its residents. Clean, modern markets complete with electric lighting, refrigeration, ventilation, an open, spacious design, childcare for vendors, and police surveillance could help to sanitize and decongest the city streets while at the same time improve the physical health of urban residents by providing them with fresh, healthy food and a safe, moral environment in which to shop for the basic necessities of life.

My data set on the period from 1953-1964 contains information on 129 public markets within the metropolitan area, for which I included the names, the number of stalls, and the latitude/longitude coordinates of each market.  As with the information on public lighting, this data came directly from the DDF’s own records at the Archivo Historico del Distrito Federal (AHDF) in Mexico City which I compiled during my dissertation research in the fall of 2011.  However, in my attempts to locate and verify the address information provided by the DDF, a fair amount of detective work was required.  The DDF data only provided the cross streets for each market’s address, a method which works fine for an address listing like Wagner y Mozart due to the unique street names.  However, the ease of geographically locating a market can be dramatically different with a listing like Constitucion y Jalisco, for which there might be four or five streets named Constitucion as well as Jalisco located throughout the city.

Additionally, in several cases, the address listing provided by the DDF (and sometimes the geographic marker provided by Google Maps, if there was one) was incorrect by entire city blocks.  This led me to walk the city streets using Street View in Google Maps, doing some detective work until I located the market around a corner or down the street several blocks.  This ability to virtually explore the city via Google was invaluable in making my map as accurate as possible and it really is a technological marvel that I can track down a public market in Mexico City from my computer at home here in Fort Collins.  What would normally take a physical visit to the city and possibly asking locals for directions now can be done in minutes on a computer thousands of miles away.  But I gush too much about my appreciation of all things Google.

As opposed to the previous layer of data on public lighting which was composed of both linear and polygon overlays onto Google Earth, I created three separate maps each capable of representing the data in a unique way.   For the first map, rendered on Google Earth, I imported my data set created in Excel into Google Fusion Tables, which then allowed me to create a .kml file which can be downloaded here.  A simple upload of the .kml file to Google Earth, and the new layer was added on top of the existing layer on public lighting.  This simple layer of pushpins is not very visually telling in itself, but the great thing about Google Earth is the ability for users to attach photos, videos, and links to other sites to each marked location on the map.

For La Merced, one of the more famous and grandiose markets in Mexico City from this time period, I attached an image of the market’s interior just prior to its opening in the fall of 1957.  Hypothetically, a group of users could collaboratively attach media and links to further information on any marked location on this map, creating an interactive, historical atlas of the city which visitors could explore location by location or via tours along preselected routes.  This interactive capacity is a big selling point for Google Earth as a means of conveying a variety of information on historical locations and its ease of use makes it possible for just about anyone to create a map quickly.

New Markets 1952-1964_google earth view

For the second map, I made the map pictured below using Tableau Public.  As I can’t embed html directly into this blog, please click here to explore the interactive version of this map.  For this visualization, I’ve linked the number of stalls attribute directly to the size of the bubble, moving beyond the simple pushpin display found on Google Maps/Earth.  This allows the viewer to see the relative size of each market based on vendor capacity and thus get a sense of market concentration in various parts of the city.

New Markets 1952-1964_tableau view

Lastly, to further analyze market concentration in various parts of the city, I used CartoDB to generate this map.  Again, please do click the link to see the interactive version.  This intensity map was created on my favorite basemap template, GMaps Dark (which I dearly wish other map programs had available), and uses three thermal rings surrounding each market location to emphasize the physical concentration of markets in various parts of the city.  It’s not quite as striking as a true choropleth map, but I just don’t have the data sets to do something like that for Mexico City.

New Markets 1952-1964_cartodb view

The process of creating these three maps has been an incredible learning experience for me both technically and as a scholar, and I hope the information conveyed helps to further illuminate this period of the city’s history.  My next layer of mapped data will likely be on new school construction, but first I plan to take a small break from maps to create my first network graph on Gephi.  Stay tuned for that graph in the very near future and thanks to everyone for the support, advice, and encouragement during this series of projects.

Leave a comment

Filed under Uncategorized