How Google Became Your Company’s Most Valuable Programmer (Part 3)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

How Many Programmers are SAPs
At a guess, 10s of millions, but estimating the total number of Search Addicted Programmers (SAP) is complicated by not really having an accurate estimate of the total number of programmers on the planet, or for that matter, how to define a programmer. http://stackoverflow.com/questions/453880/how-many-developers-are-there-in-the-world http://programmers.stackexchange.com/questions/19720/where-can-i-find-statistics-on-worldwide-developers-and-software-companies/20300#20300

One quick way to get a lower bound on the number of SAP programmers is to look at the traffic to websites like Stack Overflow http://www.quantcast.com/stackoverflow.com which as of this writing has approximately 22 million unique visitors globally per month, 85M visits and 386M page views. These numbers are just simply staggering, not only completely blowing estimates for the total number of “programmers” in the world out of the water but also pointing to a level of reliance of programmers on the Stack Overflow community and knowledge base that is honestly a little frightening.

And Stack Overflow is just one part of much larger ecosystem of online resources which developers interact with and have become dependent on. Many languages and frameworks have their own communities with similarly large repositories of knowledge specific to their community. In our recent MSR 2013 paper http://thechiselgroup.org/2013/03/27/a-study-of-innovation-diffusion-through-link-sharing-on-stack-overflow/ analyzing the types of links that programmers post on Stack Overflow we attempted to sketch out this online programmer ecosystem by analyzing of the top 20 most frequently linked to domains.

table6

We have only begun to analyze this data and there are likely issues with some domains not being entirely programming related (microsoft.com, apple.com, google.com for example) but given that these links are being posted to Stack Overflow we can assume that a large number of them are referencing programming related topics. Unfortunately as few of these domains publish their website traffic, this data does not really help us improve our estimate of the number of programmers that use these resources. However, it does give us a glimpse into the large and complex ecosystem of online resources that developers rely on and how developers share useful programming knowledge stores with other developers.

This programmer dependency on the web is obviously not a new phenomenon, to plagiarize myself “(email, bulletin boards, Usenet, IRC, the web) were most often first colonized by programmers. The emergence of Stack Overflow is the latest evolution of this historical trend with programmers inventing or adopting a technology to meet their need to discuss what it is they do with other programmers.” (Treude et al, 2011) http://ctreude.files.wordpress.com/2012/01/programming_in_a_socially_networked_world.pdf But what is new is the sheer scale of the phenomenon both in terms of the numbers of programmers visibly participating in this ecosystem and the terabytes of stored programming knowledge indexed and made easily available through powerful search engines.

table7

This is a graphic I produced for our poster at Future of Collaborative Software Development workshop at CSCW 2011 http://brendancleary.com/2012/06/05/stack-overflow-fcsd-2012-poster/ . It shows the growth in the number of Stack Overflow registered users, visits and views since its launch to early 2012. What’s really interesting about this graph is the explosion that started about September 2010 when the number of views started to grow relative to the number of unique visits. From a research perspective we don’t yet know what is causing this explosion or what it’s doing to programmers and how they write code. But we have some ideas, which we discuss in our next post.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

How Google Became Your Company’s Most Valuable Programmer (Part 2)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

Diagnosing SAP
A little while after coming to terms with my Search Addicted Programmer (SAP) condition, I got the opportunity to come work on really cool software engineering problems with the Chisel group. I decided it was probably a good time to come clean about my SAP problem and to do it with numbers by studying myself and some my fellow SAPs. So I conducted a small informal study looking at 3 developer’s browser histories over 20 days (60 developer days) to quantify how often we used search engines for programming related queries.

(Note: We were originally intending to make this into a full blown study but that project fell through. If anybody is interested in following up on this research let me know.)

table1

To keep things simple and to protect the privacy of the developers, we only looked at history entries that referenced a search engine, and each developer was allowed to review and edit their browser history beforehand.

table3

table2

What we found showed just how prevalent the SAP problem had become on my team. Almost 1000 total programming related searches, and several hundred searches per developer in only 20 days. Averaging this out we see that individual developers were executing 16 programming related searches per day (at least 2 an hour) while the team was executing almost 50.

To try and quantify what this meant in terms of developer time, I used a (finger in the air, we can argue in the comments) estimate of 5 minutes per query to approximate the number of developer hours per day spent searching. This is where things start to get interesting. Based on this small study of a small number of developers over a relatively short period of time, we get an estimate of over 1.2 hours per day per developer spent searching on the web, or over 4 hours for the 3 developers per day.

table4

table5

Every time I look at these figures I am shocked and think 5 minutes per query is surely too much, but then I think about all the times that I have spent much longer than 5 minutes debugging some nasty problem with Google and Stack Overflow and think it could be much worse. Again I have no empirical data for the 5 minutes finger in the air estimate (if you have a paper or are interested on working on it let me know), but if anything it feels a little on the conservative side.

While we can debate the impact of search in terms of developer hours per day, the raw number of queries executed marks web search (and the developer communities like Stack Overflow to which it enables access) as a very important part of a software developer’s toolkit. In fact we can probably pretty safely state that developer communities like Stack Overflow, indexed and made available through powerful search engines, have become anticipated/required parts of the software development landscape, depended on not just by programmers but by software development companies and vendors alike. In the next post we try to estimate the numbers of programmers that are SAPs. (I’ll stop making that joke soon, I promise.)

(For a more in depth study of programmer browser histories, Chris Parnin has since done a much more comprehensive study of the role of web resources in the practices of android developers: http://blog.ninlabs.com/2013/03/api-documentation/)

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs (Coming Soon)
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

How Google Became Your Company’s Most Valuable Programmer (Part 1)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications, but first a little story.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

A Story
One day about 2 years ago while working at my previous company, in the middle of a heavy development cycle, I and my entire development team lost our internet connection for about a day. That morning as a project manager getting into work and finding no external network connection I quickly did a run-through of my mental checklist of “Things We Need to Ship Code”©

  1. Are my source code repositories locally hosted (Check)
  2. Bug and task tracking database locally hosted (Check)
  3. Test & Development environment on local network (Check)
  4. No external dependencies on license servers or other stupid stuff (Check)
  5. Production servers hosted offsite with redundant net connections (Check)

I began to congratulate myself, my decision to resist the lure of hosting my code repos and bug tracking database online and to eat the maintenance cost of self-hosting had finally paid off. Here was the perfect argument for self-hosting, the network goes down but you can still be productive and write code as if nothing had happened.
But as with all good stories there is a twist. This isn’t the story about the fragility of networks or the dangers of trusting mission critical resources to external service providers. No, this is the story of how I learned that my mental checklist of “Things We Need to Ship Code”© was missing something very important, something without which myself and my programmers could no longer effectively do our jobs.

Google.

But I am getting ahead of myself, back to the story. After congratulating myself on my cleverness and talking with the team to confirm my cleverness and devise a plan for the day we sat down, pulled up our issues on the bug tracker and set to work. Everything seemed to be going along fine, I actually thought the lack of email etc for the day might be a bit of a productivity boost. But as the day progressed I started to notice that I felt… slower…. everything was just that little bit more difficult and bugs or features which I knew should have only taken a few minutes, were taking much longer than I anticipated. The reason quickly became clear, every time I ran into an unfamiliar error or needed to look up the name of an infrequently used class or method, my first instinct was to reach for my friendly search engine. But there was no internet and no search engine and I was less productive as a result.

Things I knew I could solve in seconds with a quick query were taking what seemed like forever. I had to use a debugger to debug stuff that I knew millions of other developers had already debugged, found the solution and posted to the web. It felt like I was coding with a blindfold, wasting time, reinventing or re-debugging things which I wouldn’t have had to if I could just run a quick search.

It was when I pulled out my phone to search for an exception on a miserable GPRS connection that I knew I had a problem, I had contracted SAP – Search Addicted Programmer. I was an addict and I knew it, but looking around I could see that same frustration and craving etched on the faces of my fellow addicts.

We were all SAPs :) (I’m sorry I couldn’t resist that one)

In my next post, I attempt to quantify how addicted to search I and my team had become.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs (Coming Soon)
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

A Study of Innovation Diffusion Through Link Sharing on Stack Overflow

See below for a preprint of our recently accepted MSR 2013 paper on how developers share: tools, api’s, and libraries on Stack Overflow.

Link
MSR2013 Preprint

Abstract
It is poorly understood how developers discover and adopt software development innovations such as tools, libraries, frameworks, or web sites that support developers. Yet, being aware of and choosing appropriate tools and components can have a significant impact on the outcome of a software project. In our study, we investigate link sharing on Stack Overflow to gain insights into how software developers discover and disseminate innovations. We find that link sharing is a significant phenomenon on Stack Overflow, that Stack Overflow is an important resource for software development innovation dissemination and that its part of a larger interconnected network of online resources used and referenced by developers. This knowledge can guide researchers and practitioners who build tools and services that support software developers in the exploration, discovery, and adoption of software development innovations.

ATLANTIS – Assembly Trace Analysis Environment

We recently present our work on assembly trace analysis tools featuring snazzy floating comments at WCRE 2012 in Kingston, Ontario.

Download

http://chiselgroup.files.wordpress.com/2012/10/pid2504923.pdf

Abstract

For malware authors, software is an ever fruitful source of vulnerabilities to exploit. Exploitability assessment through fuzzing aims to proactively identify potential vulnerabilities by monitoring the execution of a program while attempting to induce a crash. In order to determine if a particular program crash is exploitable (and to create a patch), the root cause of the crash must be identified. For particular classes of programs this analysis must be conducted without the aid of the original source code using execution traces generated at the assembly layer. Currently this analysis is a highly manual, text-driven activity with poor tool support. In this paper we present ATLANTIS, an assembly trace analysis environment that combines many of the features of modern IDEs with novel trace annotation and navigation techniques to support software security engineers performing exploitability analysis.

Screenshot

Atlantis – UI

Paper “Mutual Assessment in the Social Programmer Ecosystem” – accepted to CSCW 2013

Our paper on how and why developers share their development activities on the web  “Mutual Assessment in the Social Programmer Ecosystem” has been accepted for publication by CSCW 2013! Check out Leif Singer’s blog for more on the paper and the research process behind it http://blog.leif.me/2012/06/mutual-assessment/

Download

http://leif.me/papers/MutualAssessment-DCS-347-IR.pdf

Abstract

The multitude of social media channels that programmers can use to participate in software development has given rise to online developer profiles that aggregate activity across many services. Studying members of such developer profile aggregators, we found an ecosystem that revolves around the social programmer. Developers are assessing each other to evaluate whether other developers are interesting, worth following, or worth collaborating with. They are self-conscious about being assessed, and thus manage their public images. They value passion for software development, new technologies, and learning. Some recruiters participate in the ecosystem and use it to find candidates for hiring; other recruiters struggle with the interpretation of signals and issues of trust. This mutual assessment is changing how software engineers collaborate and how they advance their skills.

Summary of Graduate Research: Wayfinding in Acquired Brain Injury

Think of the last time you went somewhere. Maybe you went to the grocery store, to the park, or to the university. To get there, you had to plan and follow a route. Planning and following a route is called wayfinding. Research has shown that people with brain injury often have a hard time wayfinding, which may reduce community access. But hand-held technology is becoming more advanced, and may help people with brain injury to plan and follow their routes.

With this in mind, the goal of my research was to better understand the requirements for designing a hand-held wayfinding tool for users with acquired brain injury. Through a focus group with questionnaire, as well as a series of nine personal interviews, participants with acquired brain injury shared their wayfinding experiences. They also commented on the idea and design of a hand-held wayfinding tool.

It became clear that wayfinding may invoke a deep emotional response. Some participants feel anxious about going to unfamiliar places. Others may suddenly lose their way and become terribly flustered. Coping with unexpected change, such as a detour, and feeling overwhelmed by too much going on are also serious problems. Several strategies came to light, such as marking up bus schedules and maps, going over directions step by step with a bus driver or family member, and following a series of landmarks.

Most participants were quite enthusiastic about the idea of a hand-held wayfinding tool. Some said that it would increase their confidence, but others said they would have no need for it. Cost was also a concern, along with losing or damaging the hand-held device. Many design ideas were offered, showing that each person may have different needs. There were common themes too, such as a simple interface and audio feedback. Several participants said that a map with landmarks would be very useful.

This research suggests that facilitating orientation and managing anxiety are important for survivors of acquired brain injury, and constitute complementary targets for software support . These users will likely need more support going to unfamiliar places, but their abilities and confidence may improve over time. A hand-held wayfinding tool should be aware of the abilities of each user, and respond quickly to what’s going on. It should be as interactive as possible, so that the user feels engaged and empowered. It should include information on landmarks. It must be simple to use.

This research will play an important role in a project called CanGo. CanGo is a wayfinding tool being developed at CanAssist (http://www.canassist.ca/). CanAssist is an organization at UVic that makes technology for people with disabilities. I am a software developer at CanAssist, and an active member of the CanGo team.

TagSEA for IDA

TagSEA is a CHISEL project from 2006 by Jody Ryall, Del Myers, and John Anvik - it’s an Eclipse plugin that allows developers to add tags to their code – bugs, works-in-progress, and so on. That looks something like this:

CHISEL has been doing a bunch of work with Defence Research and Development Canada, on reverse engineering. Like most reversers, they use the mighty IDA Pro to do their hacking. So, we made them a port of TagSEA. It looks like this:

Those tags can be filtered, sorted, and explored using the tree & table. Tags can be added via comments, like in the Eclipse verison: “#tag MainTag.Subtag: Comment About A Tag”.

That is to say, make a tag called “Subtag”, under MainTag, with the comment “Comment About A Tag”. You can also add a specific author, or multiple tags per line: “#tag MainTag.Subtag OtherTag -author=Some Guy: Comment About A Tag”.

TagSEA is built in Python, courtesy of IDA’s IDAPython plugin, and the PySide framework for QT.  Further work may include simplifying the sharing of tags, and making the syntax & search more robust.