How Search Became Your Company’s Most Valuable Programmer (Part 5)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

Refactoring the Programmer
How programmers develop software is changing rapidly. Software developer communities and powerful search engines are becoming an increasingly integral (and often unacknowledged) part of the software development process. Driven by increasing solution complexity and the proliferation of frameworks, libraries and platforms required to deliver a single solution, programmers (both novice and expert) frequently find themselves faced with having to integrate and debug multiple heterogeneous pieces of code but without the time to develop even a modest understanding of those components. Search engines and developer communities like Stack Overflow provide a lifeline to the programmer allowing them to mine the stored collective intelligence of vast communities of fellow software developers. However, the reliance of programmers on these communities raises some interesting questions for software companies, software engineering researchers and educators alike.

Since coming to work with Chisel, I and others like Margaret-Anne Storey, Christoph Treude and Leif Singer have attempted to capture the impact that large developer knowledge stores like Stack Overflow (indexed and made accessible through search engines) are having on programmers and software development in general. The Social Programmer is a concept that attempts to sum up the new reality faced by many programmers and asks the question do we need to rethink our concept of what a programmer is and what they do? I this post we ask do we need to refactor the programmer?

(Note the following sections are based on our FCSD 2011 paper (http://ctreude.ca/2012/01/05/futurecsd2012/) where we introduced the concept of the social programmer)

The Evolution of the Social Programmer
The emergence of Stack Overflow as a repository of programming knowledge is the latest evolution of a historical trend with programmers inventing or adopting a “social” technology to meet their need to discuss what it is they do with other programmers. However, with sites such as Stack Overflow indexed and made available through search engines, are we approaching the point where the archive of stored programming knowledge reaches a critical mass and where new programming practices and behaviors will emerge? Have we already reached that point and what kinds of impact might we see on programmer practices and the software development industry as a result?

fig3

We think the programmer has already been refactored, it’s happening day by day largely invisibly but evidenced in the ever increasing contributions by programmers to communities like Stack Overflow and GitHub and by the proliferation of programming languages, libraries and platforms. Together these online services for programmers (infrastructure), the programmers that participate and a shared philosophy make up the social programmer ecosystem. We are currently in the process of charting this eco-system.

What Makes a Good Programmer?
For a programmer in the social programmer ecosystem, where a large percentage of programming knowledge is archived and curated by millions of “experts”, do we have to redefine the attributes of a good programmer? Is the metric of a good programmer someone with a deep understanding of programming and software engineering principles, or someone who can leverage and synthesize the programming community to achieve the same results? When you are very unlikely to be the first person in the world to encounter a particular problem, does a smart programmer attempt to diagnose the problem independently or just ask the community? Will an entirely new category of programmers emerge, a Just in Time Programmer, without formal training but with the ability to combine snippets to craft solutions that will just meet their needs? What tools will these new programmers require?

Software Development as a Massively Distributed Activity
If future programmers will require the ability to synthesize a single solution from the contributions of the many, it raises interesting questions (both positive and negative) about the nature of the software that those programmers produce. Give an environment where practically all software is developed with reliance on a shared knowledge base and community, who actually owns the intellectual property? Is there a risk that programmers do not really understand how their software works? Or will it in fact lead to better efficiency by reducing time spent fixing bugs or re-inventing the wheel? Could the distributed development approach to programming actually increase quality by promoting best practice solutions? Is this a realization of software componentization, different from the vision of component based development perhaps but effectively a similar result? What organizational changes will this shift entail in terms of social offshoring and the ad hoc creation of teams?

Conclusion
Ultimately it’s too early to make a judgment on the social programmer and the impact the phenomenon will have on software development practice. We are already starting to see changes in in the skills companies and recruiters are looking for in programmers (http://thechiselgroup.org/2012/10/31/paper-mutual-assessment-in-the-social-programmer-ecosystem-accepted-to-cscw-2013/). At the moment this appears to be confined to a distinct subset of programmers but it will be interesting to see if it spreads and how far. There will also always be programmers who find it difficult to fully participate in these communities because of intellectual property or security restrictions for example. How we bring the benefits of the social programmer ecosystem to these programmers is an interesting challenge.

A final though. An argument frequently raised when discussing these topics is quality. What is this trend going to do to software quality? Are programmers going to become copy and paste programmers, findings snippets of code on the web and then pulling them together to craft a solution without any understanding of how it works. Frankly I’m sure this is already happening but it’s not fully clear if this is; a new trend, can be stopped or is an entirely bad thing. We will probably see the emergence of a group of programmers that lack foundational computer science and software engineering concepts, but this is not necessarily new, many practicing programmers were not trained as programmers. The potential benefit of the social programmer ecosystem is that these programmers can interact with and learn from communities of knowledgeable programmers where expertise is valued and where best practices can be disseminated.

Well, we hope :)

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

How Search Became Your Company’s Most Valuable Programmer (Part 4)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

Causes & Effects of SAP
So the completely fake condition SAP is actually probably real and apparently affects millions of programmers, but what causes it? While researches have been studying social media and the role of web 2.0 in software development for a while, the full impact of this shift in programming practice is really only starting becoming apparent now. In this post we combine speculation and bits of empirical research to try to understand what is causing this shift in programmer practices and the impact it may have on software development.

Increasing Solution Complexity
Programmers are increasingly faced with having to program in multiple languages and across multiple platforms http://www.drdobbs.com/architecture-and-design/the-quiet-revolution-in-programming/240152206 . In the last several years a shift in user experience expectations and the rise of mobile and touch centric hardware has caused a splintering in the set of development technologies required to deploy a product to the widest possible audience. This paradigm shift can be seen most clearly in two parallel trends: the explosion of JavaScript frameworks for the web (jQuery, NodeJS, Bootstrap) and the introduction of rich client mobile applications and their associated marketplaces (iOS, Android, Windows).

The software development landscape is now an acronym soup of competing development platforms with no clear winner in sight. Our CSCW2012 paper on how developers evaluate other developers (http://thechiselgroup.org/2012/10/31/paper-mutual-assessment-in-the-social-programmer-ecosystem-accepted-to-cscw-2013/ ) indicates that developers, companies and recruiters are adapting to this new environment. Whereas once companies looked for employees with deep experience in a particular technology stack, now companies are starting to look for developers that can show a wide range of skills across multiple technology stacks and an ability to learn quickly. For many programmers this confusion of development platforms means being able to develop expertise in a single technology stack is now an unaffordable luxury. Rather programmers are expected (using resources like Stack Overflow) to be able to be instantly conversant in multiple languages and platforms and to be able to integrate with whatever library, component or service that may be required to deliver a solution.

This is speculation, but there is likely an interesting feedback loop here between the amount of indexed programming knowledge available on sites like Stack Overflow and the increase in the complexity and heterogeneity of solutions that programmers are developing. It’s very possible that one trend enables and promotes the other. For example an increasing number of posts on a wide range of programming technologies on Stack Overflow increases the confidence programmers have in their ability to successfully combine more disparate technologies into a functioning solution. This in turn inevitably leads to more questions and answers being posted to Stack Overflow about the individual technologies and how to combine them. Which leads to greater programmer confidence….which leads…, rinse and repeat.

fig1

I am not aware of research that looks at how adventurous developers are in taking on complicated multi-language/platform projects the presence or absence of the documentation provided by developer communities. But it would be a good paper.

Poor Official Documentation
Another factor pushing the increasing reliance of developers on search and sites like Stack Overflow is the state of official documentation. The official technical documentation published by platform and language vendors often pales in quality in comparison to the crowd documentation generated by developers contributing to sites like Stack Overflow. There have been several interesting studies done on this topic http://ctreude.ca/2011/02/16/stackoverflow/ and http://www.cc.gatech.edu/~vector/papers/webdocumentation.pdf and http://blog.ninlabs.com/2013/03/api-documentation/ looking at what kinds of crowd documentation is available and how much depth and coverage it provides vs. the official documentation.

An obvious differentiator between official static documentation and the type of crowd documentation provided by something like the Stack Overflow community is the interactivity and the sheer number of potential contributors. Mamykina et al 2011 show that a question asked on stack overflow has a median answer time of 11 minutes http://bid.berkeley.edu/files/papers/mamykina-stackoverflow-chi2011.pdf but with the large number of users on Stack Overflow some questions can start receiving answers mere seconds after posting. Very few vendors can ever hope to offer the level of documentation or developer support that can be generated by the Stack Overflow community machine. However, this does not mean that all official documentation is redundant or unnecessary, in fact our study of links posted to Stack Overflow shows that developers do actually frequently refer to official vendor documentation in their posts http://thechiselgroup.org/2013/03/27/a-study-of-innovation-diffusion-through-link-sharing-on-stack-overflow/

fig2

Again I speculate that there is the possibility that we might be witnessing or about to witness another interesting feedback loop where increasing quality and coverage of crowd documentation reduces the impetus for vendors to provide good quality documentation. Alternatively we may see vendors change their documentation and developer support strategies to incorporate features from sites like Stack Overflow or to start officially participating and providing support through these types of sites (this has been happening for quite some time at the individual level but not sure if its official policy for many vendors yet).

The search engine as the World’s Fastest Debugger
Ok so think about this scenario, you are a programmer writing some code on a particular platform and you encounter a problem. What is the probability that you are the first programmer in the world to ever encounter that error/bug? It happens yes, but as long as you are not working in something very exotic (http://xkcd.com/979/) I think you will agree that the likelihood is pretty low. Now add to this scenario a thousands of communities like Stack Overflow where millions of other programmers are continually posting questions and answers about these types of errors. Finally index it and make the whole thing easily accessible through powerful search engines.

Congratulations you have just created the world’s fastest and smartest distributed debugger. (Note I can’t remember if I came up with the search engine as a debugger or if I am sealing it from someone else, if I am let me know and I will cite you)

Now I am not suggesting that the act of debugging is replaced by having access to Stack Overflow and excellent search engines (after all someone has to post the first answer), rather I am talking about the effects of this stored debugging knowledge is having on the process of debugging. Why would a developer spend time self-debugging when they can quickly access the results of another developers debugging efforts? I know I do it all the time. When I encounter an error, where the solution isn’t immediately obvious, I do a quick search. The cost to doing the search is so low and the potential benefit so large that it’s a no-brainer.

I like to think of the search engine as a debugger concept as an important (but overlooked) part of the trend towards reducing the edit-compile-run-debug cycle that is occurring across all parts of the programming world http://liveprogramming.github.io/2013/about.html. The search engine as a debugger gives programmers more immediate feedback on how to solve their programming problems, hopefully allowing them to solve the problem and move on more quickly. And if you find that you actually are the first one to encounter and solve the problem you can always post the answer to Stack Overflow and get some nice karma!

More importantly though I think the search engine as the debugger is a symptom not just of a simple one dimensional change in programmer practices but rather a wholesale shift in what it means to be a programmer. Our next post asks the question, is it time to refactor our idea of the programmer?

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

Reconstructing Program Memory State from Multi-gigabyte Instruction Traces to Support Interactive Analysis

See below for a preprint of our recently accepted WCRE 2013 paper.

Link: WCRE2013 Preprint

Abstract:
Exploitability analysis is the process of attempting to determine if a vulnerability in a program is exploitable. Fuzzing is a popular method of finding such vulnerabilities, in which a program is subjected to millions of generated program inputs until it crashes. Each program crash indicates a potential vulnerability that needs to be prioritized according to its potential for exploitation. The highest priority vulnerabilities need to be investigated by a security analyst by re-executing the program with the input that caused the crash while recording a trace of all executed assembly instructions and then performing analysis on the resulting trace. Recreating the entire memory state of the program at the time of the crash, or at any other point in the trace, is very important for helping the analyst build an understanding of the conditions that led to the crash. Unfortunately, tracing even a small program can create multi-million line trace files from which reconstructing memory state is a computationally intensive process and virtually impossible to do manually. In this paper we present an analysis of the problem of memory state reconstruction from very large execution traces. We report on a novel approach for reconstructing the entire memory state of a program from an execution trace that allows near real-time queries on the state of memory at any point in a program’s execution trace. Finally we benchmark our approach showing storage and performance results in line with our theoretical calculations and demonstrate memory state query response times of less than 100ms for trace files up to 60 million lines.

20 hrs in Hacker News Top 10

Last weekend I was in beautiful sunny San Francisco for 3 days to present our paper on mining URLs posted by developers on Stack Overflow at MSR2013. Along with the paper we wrote, I also launched a website (linkedlists.net) that allows users to interactively explore our dataset.

Following my short talk I posted the link to Hacker News and then sat back and watched as my world exploded.

In about 5 minutes we were on the Hacker News front page, in another 5 minutes we were at number 2 fighting with a TechCrunch article about Hacker News for the top spot. While we didn’t (I think) get to number 1 we did stay in the Top 10 for almost the next 20 hours and only fell out of the top 50 a few days later.

The reaction has been, in a word, incredible. I don’t think any of us were expecting the level of interest and positivity that we received from the developer community. To give you an idea of what being in the Top 10 of Hacker News looks like from the perspective of a server this is a screen shot of the requests per minute our AWS load balancer was dealing with.

linkedlists_requests_per_minute

Now I agree it’s not exactly Google scale, there are plenty of small websites that easily handle worse every day, but for a small software engineering research project it’s pretty substantial.
As of right now Google analytics is telling me that since launch we have had over 29K unique visitors and almost 36K page views. Importantly we are starting to see people are coming back with our numbers of returning visitors currently at a couple of thousand and average time on the size creeping over the 3 minute mark.

I would like to thank everybody that helped with the research and testing the site, the 100’s of people that helped spread the word on twitter and finally all those who submitted feature requests.

We have lots of great ideas about what to do next so please stay tuned by following us on twitter @listslinked

How Search Became Your Company’s Most Valuable Programmer (Part 3)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

At a guess, 10s of millions, but estimating the total number of Search Addicted Programmers (SAP) is complicated by not really having an accurate estimate of the total number of programmers on the planet, or for that matter, how to define a programmer. http://stackoverflow.com/questions/453880/how-many-developers-are-there-in-the-world http://programmers.stackexchange.com/questions/19720/where-can-i-find-statistics-on-worldwide-developers-and-software-companies/20300#20300

One quick way to get a lower bound on the number of SAP programmers is to look at the traffic to websites like Stack Overflow http://www.quantcast.com/stackoverflow.com which as of this writing has approximately 22 million unique visitors globally per month, 85M visits and 386M page views. These numbers are just simply staggering, not only completely blowing estimates for the total number of “programmers” in the world out of the water but also pointing to a level of reliance of programmers on the Stack Overflow community and knowledge base that is honestly a little frightening.

And Stack Overflow is just one part of much larger ecosystem of online resources which developers interact with and have become dependent on. Many languages and frameworks have their own communities with similarly large repositories of knowledge specific to their community. In our recent MSR 2013 paper http://thechiselgroup.org/2013/03/27/a-study-of-innovation-diffusion-through-link-sharing-on-stack-overflow/ analyzing the types of links that programmers post on Stack Overflow we attempted to sketch out this online programmer ecosystem by analyzing of the top 20 most frequently linked to domains.

table6

We have only begun to analyze this data and there are likely issues with some domains not being entirely programming related (microsoft.com, apple.com, google.com for example) but given that these links are being posted to Stack Overflow we can assume that a large number of them are referencing programming related topics. Unfortunately as few of these domains publish their website traffic, this data does not really help us improve our estimate of the number of programmers that use these resources. However, it does give us a glimpse into the large and complex ecosystem of online resources that developers rely on and how developers share useful programming knowledge stores with other developers.

This programmer dependency on the web is obviously not a new phenomenon, to plagiarize myself “(email, bulletin boards, Usenet, IRC, the web) were most often first colonized by programmers. The emergence of Stack Overflow is the latest evolution of this historical trend with programmers inventing or adopting a technology to meet their need to discuss what it is they do with other programmers.” (Treude et al, 2011) http://ctreude.files.wordpress.com/2012/01/programming_in_a_socially_networked_world.pdf But what is new is the sheer scale of the phenomenon both in terms of the numbers of programmers visibly participating in this ecosystem and the terabytes of stored programming knowledge indexed and made easily available through powerful search engines.

table7

This is a graphic I produced for our poster at Future of Collaborative Software Development workshop at CSCW 2011 http://brendancleary.com/2012/06/05/stack-overflow-fcsd-2012-poster/ . It shows the growth in the number of Stack Overflow registered users, visits and views since its launch to early 2012. What’s really interesting about this graph is the explosion that started about September 2010 when the number of views started to grow relative to the number of unique visits. From a research perspective we don’t yet know what is causing this explosion or what it’s doing to programmers and how they write code. But we have some ideas, which we discuss in our next post.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

How Search Became Your Company’s Most Valuable Programmer (Part 2)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

Diagnosing SAP
A little while after coming to terms with my Search Addicted Programmer (SAP) condition, I got the opportunity to come work on really cool software engineering problems with the Chisel group. I decided it was probably a good time to come clean about my SAP problem and to do it with numbers by studying myself and some my fellow SAPs. So I conducted a small informal study looking at 3 developer’s browser histories over 20 days (60 developer days) to quantify how often we used search engines for programming related queries.

(Note: We were originally intending to make this into a full blown study but that project fell through. If anybody is interested in following up on this research let me know.)

table1

To keep things simple and to protect the privacy of the developers, we only looked at history entries that referenced a search engine, and each developer was allowed to review and edit their browser history beforehand.

table3

table2

What we found showed just how prevalent the SAP problem had become on my team. Almost 1000 total programming related searches, and several hundred searches per developer in only 20 days. Averaging this out we see that individual developers were executing 16 programming related searches per day (at least 2 an hour) while the team was executing almost 50.

To try and quantify what this meant in terms of developer time, I used a (finger in the air, we can argue in the comments) estimate of 5 minutes per query to approximate the number of developer hours per day spent searching. This is where things start to get interesting. Based on this small study of a small number of developers over a relatively short period of time, we get an estimate of over 1.2 hours per day per developer spent searching on the web, or over 4 hours for the 3 developers per day.

table4

table5

Every time I look at these figures I am shocked and think 5 minutes per query is surely too much, but then I think about all the times that I have spent much longer than 5 minutes debugging some nasty problem with search and Stack Overflow and think it could be much worse. Again I have no empirical data for the 5 minutes finger in the air estimate (if you have a paper or are interested on working on it let me know), but if anything it feels a little on the conservative side.

While we can debate the impact of search in terms of developer hours per day, the raw number of queries executed marks web search (and the developer communities like Stack Overflow to which it enables access) as a very important part of a software developer’s toolkit. In fact we can probably pretty safely state that developer communities like Stack Overflow, indexed and made available through powerful search engines, have become anticipated/required parts of the software development landscape, depended on not just by programmers but by software development companies and vendors alike. In the next post we try to estimate the numbers of programmers that are SAPs. (I’ll stop making that joke soon, I promise.)

(For a more in depth study of programmer browser histories, Chris Parnin has since done a much more comprehensive study of the role of web resources in the practices of android developers: http://blog.ninlabs.com/2013/03/api-documentation/)

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

How Search Became Your Company’s Most Valuable Programmer (Part 1)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications, but first a little story.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer

A Story
One day about 2 years ago while working at my previous company, in the middle of a heavy development cycle, I and my entire development team lost our internet connection for about a day. That morning as a project manager getting into work and finding no external network connection I quickly did a run-through of my mental checklist of “Things We Need to Ship Code”©

  1. Are my source code repositories locally hosted (Check)
  2. Bug and task tracking database locally hosted (Check)
  3. Test & Development environment on local network (Check)
  4. No external dependencies on license servers or other stupid stuff (Check)
  5. Production servers hosted offsite with redundant net connections (Check)

I began to congratulate myself, my decision to resist the lure of hosting my code repos and bug tracking database online and to eat the maintenance cost of self-hosting had finally paid off. Here was the perfect argument for self-hosting, the network goes down but you can still be productive and write code as if nothing had happened.
But as with all good stories there is a twist. This isn’t the story about the fragility of networks or the dangers of trusting mission critical resources to external service providers. No, this is the story of how I learned that my mental checklist of “Things We Need to Ship Code”© was missing something very important, something without which myself and my programmers could no longer effectively do our jobs.

Search.

But I am getting ahead of myself, back to the story. After congratulating myself on my cleverness and talking with the team to confirm my cleverness and devise a plan for the day we sat down, pulled up our issues on the bug tracker and set to work. Everything seemed to be going along fine, I actually thought the lack of email etc for the day might be a bit of a productivity boost. But as the day progressed I started to notice that I felt… slower…. everything was just that little bit more difficult and bugs or features which I knew should have only taken a few minutes, were taking much longer than I anticipated. The reason quickly became clear, every time I ran into an unfamiliar error or needed to look up the name of an infrequently used class or method, my first instinct was to reach for my friendly search engine. But there was no internet and no search engine and I was less productive as a result.

Things I knew I could solve in seconds with a quick query were taking what seemed like forever. I had to use a debugger to debug stuff that I knew millions of other developers had already debugged, found the solution and posted to the web. It felt like I was coding with a blindfold, wasting time, reinventing or re-debugging things which I wouldn’t have had to if I could just run a quick search.

It was when I pulled out my phone to search for an exception on a miserable GPRS connection that I knew I had a problem, I had contracted SAP – Search Addicted Programmer. I was an addict and I knew it, but looking around I could see that same frustration and craving etched on the faces of my fellow addicts.

We were all SAPs :) (I’m sorry I couldn’t resist that one)

In my next post, I attempt to quantify how addicted to search I and my team had become.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP
Part 5 – Refactoring the Programmer