Welcome

Featured

We are interdisciplinary researchers with diverse backgrounds based in the Department of Computer Science at the University of Victoria. Our offices are located in the Engineering/Computer Science building.

Our research interests include:

  • cognitive support and technology diffusion
  • human computer interaction
  • human and social implications of technology use
    (social informatics)
  • interface design
  • knowledge engineering
  • software engineering
  • technology and pedagogy
  • visualization

Our primary objective is to develop tools that support people in performing complex cognitive tasks. Our projects benefit from the collaborative approach taken within our group and with other researchers. As a group, we operate by thinking creatively, exploiting our synergies, and applying innovative research techniques.

Contact us

We’re on Twitter and Facebook.

Collaborators

We frequently work with several other groups in the department:

20 hrs in Hacker News Top 10

Last weekend I was in beautiful sunny San Francisco for 3 days to present our paper on mining URLs posted by developers on Stack Overflow at MSR2013. Along with the paper we wrote, I also launched a website (linkedlists.net) that allows users to interactively explore our dataset.

Following my short talk I posted the link to Hacker News and then sat back and watched as my world exploded.

In about 5 minutes we were on the Hacker News front page, in another 5 minutes we were at number 2 fighting with a TechCrunch article about Hacker News for the top spot. While we didn’t (I think) get to number 1 we did stay in the Top 10 for almost the next 20 hours and only fell out of the top 50 a few days later.

The reaction has been, in a word, incredible. I don’t think any of us were expecting the level of interest and positivity that we received from the developer community. To give you an idea of what being in the Top 10 of Hacker News looks like from the perspective of a server this is a screen shot of the requests per minute our AWS load balancer was dealing with.

linkedlists_requests_per_minute

Now I agree it’s not exactly Google scale, there are plenty of small websites that easily handle worse every day, but for a small software engineering research project it’s pretty substantial.
As of right now Google analytics is telling me that since launch we have had over 29K unique visitors and almost 36K page views. Importantly we are starting to see people are coming back with our numbers of returning visitors currently at a couple of thousand and average time on the size creeping over the 3 minute mark.

I would like to thank everybody that helped with the research and testing the site, the 100’s of people that helped spread the word on twitter and finally all those who submitted feature requests. We have lots of great ideas about what to do next so please stay tuned.

MSR 2013

General Overview

MSR is the acronym for Mining Software Repositories and is a conference dedicated to those who want to share their data mining research findings. MSR has been celebrated once a year since 2004, and it is always co-located with ICSE (International Conference on Software Engineering).

ICSE

For this year, MSR was located in San Francisco, California, and to provide more opportunity to participate in the conference, the chairs introduced two new types of papers: Data papers and Practice papers. These papers encourage researchers to report their experience in organizational environments (positive or negative) and share their data as repository files. The objective of these two papers was to introduce the MSR community to an organizational environment, inform the community about the organization’s needs, and to show the community new sources of information that could be mined.

For this occasion, MSR was a two-day event. The first day was divided into seven sessions:

  • Bug Triaging: In this session, the researchers showed how to improve the bug communication reports and auto assignment of bug fixing tasks to the correct developer or team.
  • MSR Goes Mobile: In this session, participants showed their findings mining the data from mobile sources such as cell phones, tablets and other mobile technologies.
  • MSR Challenge: In this session, the participants showed their findings from mining a predefined data repository. In this occasion, the data repository was Stack Overflow.

MSR Mining Challenge

  • Changes and Fixes: In this session, participants worked over repositories of changes and fixes try to find meaning behind these two activities.
  • Software Evolution: In this session, participants showed the importance of code clones and their findings over version repositories and code repositories.
  • Analysis of Bug Reports: In this session, participants showed their findings over bug reports, bug duplication detection and non-committers from bug repositories.
  • Software Ecosystems, Big Data: In this session, participants showed their findings mining different data repositories such us GitHub and GitGnome. Also, a new data repository, Maven dependency repository, was presented to the community.

The second day contained six sessions:

  • Bug/Change Classification and Localization: In this session, participants showed how they classify and localize the changes and bugs, how the correlation between fixing bugs and crash reports could improve bug localization, and what kind of relationships exist between system properties such as size, stability and encapsulation.
  • Social Mining: In this session, participants identified important behaviours and characteristics of distributed developer teams, communication methods in open source developer teams, roles in open source projects, and others.

Social Mining

  • Search-Driven Development: In this session, participants showed their findings related to mining code such as API usage patterns, assisting code searches, and others.
  • 10 Years of MSR: In this session, participants analyzed the history of MSR, the topics presented, and best practices that the community developed through the years. They also presented a new repository with all the information related to the software engineering conferences.
  • Mining Unstructured Data: As its name implies, participants analyzed unstructured data, such as trace files, with the objective to understand developers and system behaviours.

MSR Technical track

  • Predictor Models: In this session, participants showed their predictor models based on project analysis and social network metrics.

Between each session, the chairs of the conference guided a discussion about the topics related to each session, in which participants that have doubts about the presentations or participants that have a different point of view could express them self in a freely but polite way.

CHISEL at MSR

2013-05-19 16.48.26

This year, CHISEL showed its presence with three papers in different sessions: Social Mining, Mining Unstructured Data, and Challenges.

In the social mining session, the paper presented was “Fixing the ‘Out of Sight Out of Mind’ Problem: One Year of Mood-Based Microblogging in a Distributed Software Team”. This work is a study of the challenges that distributed teams have to face and how they use microblogging tools to stay in touch and even show emotion through it.

In the mining unstructured data session the paper presented was “Strategies for Avoiding Text Fixture Smells during Software Evolution”. This paper is a study of how fixture-related test smells evolve over time based on the analysis of several thousand revisions of open source systems.

The last (but not least) paper was presented in the challenge session. A Study of Innovation Diffusion through Link Sharing on Stack Overflow” is about how innovation is diffused in Stack Overflow through resource (links) sharing.

Personal Impression

My personal impression about the experience with such a magnificent conference is:

  • MSR Community: MSR community is friendly and polite. They always look for ways to grow their membership and to construct new research bonds among the community.  They are open to new ideas and to change if it is necessary.

On the other hand, the MSR community wants to improve the communication with the organizers’ environments and help them in the process of evolution. That is why the MSR community encourages all researchers to participate in the community through a new kind of paper called Practice paper.

  • Why is so important to participate?: As a researcher, we have to show our work to the community through this kind of event and receive feedback. Also, the human relationships that can be made during the dining night of this event can lead the researchers to work in collaboration with other universities or organizations that are interested in the same topic.

10 Things to Do When You Are a Grad Student in Tech

Compared with undergraduate studies, graduate studies provide students with infinite number of choices that range from their supervisors to areas of research. To a reasonable extent, you can pick courses that you take or TA, methods that you use in your research, and activities that contribute to your degree.

However, what many students easily forget is that the end goal is not getting you degree per se, but building your desirable carrier. Not only research skills, but also other skills gained by means of your graduate school experience can contribute to your future hireability.  Nobody will tell you what to do and what you need. It is your responsibility to find opportunities and take actions. Here is a list of things that you might like to consider if you are in grad school in a technical field:

  1. Do research

No, not the research you are thinking about. Research what the job market looks like in your domain. You want to work for Google, Apple, or Microsoft? Research what types of skills they are looking for and what types of projects they have. Better yet, don’t concentrate on one company. Keep your mind open. Think of your specialization, such as mobile applications, malware analysis, etc., and search for companies that look for relevant specialists.

  1. Start an open source project or contribute to one

Starting an open source project is a great opportunity to stay fit for hard core coding. Joining an existing project can save you some time and this will ensure that you will have something to show for at the end. Remember that annoying bug in program X that you hate so much? Fix it.

  1. Volunteer

This is another way to help tech or any other communities. Whether it is organizing a workshop for first year computer science students or volunteering at a yearly marathon event, you can have a lot of fun and meet new friends. Are you one of those guys that is uncomfortable helping people? Remember that volunteering looks pretty good on your resume and do it.

  1. Apply for scholarships

The first advantage here is obvious: money! Some scholarships, such as the Google Anita Borg Memorial scholarship, also give you an opportunity to visit corporate facilities and make contacts.

  1. Communicate

Talk to your supervisor, talk to your peers, labmates, faculty members, etc. about your research and activities. Not only you will know whom to ask for help when you need it, but also you will see how everyone else is doing, and it might make you feel better about your progress (don’t count on it though).

  1. Go public

Advertise your achievements on all possible public media sites if your contract allows you to. You can publish videos on YouTube, tweet, write blogs, post on Facebook, add skills on LinkedIn, etc. Having a strong presence online with positive content might get you the job you always wanted.

  1. Find a mentor

Whenever things go wrong or you just need help or advice, it is useful to have someone with more experience in your field and not related to your research project to give you an advice. It could be another faculty member, business person you met on a plane, a friend of a friend, etc.

  1. Help peers

Participate in Google Groups, go to Stack Overflow and answer questions. It is not only a great contribution to the tech community. Some companies do look at your Stack Overflow reputation before hiring.

  1. Get fit

This is the toughest one, and it is important. It is the fact: fit people get better jobs. Having good health overall will help you in your stressful grad life, especially during sleepless nights before your defence and other deadlines.

  1. Have fun

Do not wait until you defend to have fun! Enjoy yourself as you are hammering through grad school because life is too short not to enjoy it. Keep yourself interested in technology, spend time with friends, do crazy things such as bungy jumping and skydiving, and get out and enjoy nature every once in a while.

Although this list is a good place to start, do not stop here. Look for other opportunities to make your graduate experience interesting and memorable.

How Google Became Your Company’s Most Valuable Programmer (Part 3)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

How Many Programmers are SAPs
At a guess, 10s of millions, but estimating the total number of Search Addicted Programmers (SAP) is complicated by not really having an accurate estimate of the total number of programmers on the planet, or for that matter, how to define a programmer. http://stackoverflow.com/questions/453880/how-many-developers-are-there-in-the-world http://programmers.stackexchange.com/questions/19720/where-can-i-find-statistics-on-worldwide-developers-and-software-companies/20300#20300

One quick way to get a lower bound on the number of SAP programmers is to look at the traffic to websites like Stack Overflow http://www.quantcast.com/stackoverflow.com which as of this writing has approximately 22 million unique visitors globally per month, 85M visits and 386M page views. These numbers are just simply staggering, not only completely blowing estimates for the total number of “programmers” in the world out of the water but also pointing to a level of reliance of programmers on the Stack Overflow community and knowledge base that is honestly a little frightening.

And Stack Overflow is just one part of much larger ecosystem of online resources which developers interact with and have become dependent on. Many languages and frameworks have their own communities with similarly large repositories of knowledge specific to their community. In our recent MSR 2013 paper http://thechiselgroup.org/2013/03/27/a-study-of-innovation-diffusion-through-link-sharing-on-stack-overflow/ analyzing the types of links that programmers post on Stack Overflow we attempted to sketch out this online programmer ecosystem by analyzing of the top 20 most frequently linked to domains.

table6

We have only begun to analyze this data and there are likely issues with some domains not being entirely programming related (microsoft.com, apple.com, google.com for example) but given that these links are being posted to Stack Overflow we can assume that a large number of them are referencing programming related topics. Unfortunately as few of these domains publish their website traffic, this data does not really help us improve our estimate of the number of programmers that use these resources. However, it does give us a glimpse into the large and complex ecosystem of online resources that developers rely on and how developers share useful programming knowledge stores with other developers.

This programmer dependency on the web is obviously not a new phenomenon, to plagiarize myself “(email, bulletin boards, Usenet, IRC, the web) were most often first colonized by programmers. The emergence of Stack Overflow is the latest evolution of this historical trend with programmers inventing or adopting a technology to meet their need to discuss what it is they do with other programmers.” (Treude et al, 2011) http://ctreude.files.wordpress.com/2012/01/programming_in_a_socially_networked_world.pdf But what is new is the sheer scale of the phenomenon both in terms of the numbers of programmers visibly participating in this ecosystem and the terabytes of stored programming knowledge indexed and made easily available through powerful search engines.

table7

This is a graphic I produced for our poster at Future of Collaborative Software Development workshop at CSCW 2011 http://brendancleary.com/2012/06/05/stack-overflow-fcsd-2012-poster/ . It shows the growth in the number of Stack Overflow registered users, visits and views since its launch to early 2012. What’s really interesting about this graph is the explosion that started about September 2010 when the number of views started to grow relative to the number of unique visits. From a research perspective we don’t yet know what is causing this explosion or what it’s doing to programmers and how they write code. But we have some ideas, which we discuss in our next post.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

How Google Became Your Company’s Most Valuable Programmer (Part 2)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

Diagnosing SAP
A little while after coming to terms with my Search Addicted Programmer (SAP) condition, I got the opportunity to come work on really cool software engineering problems with the Chisel group. I decided it was probably a good time to come clean about my SAP problem and to do it with numbers by studying myself and some my fellow SAPs. So I conducted a small informal study looking at 3 developer’s browser histories over 20 days (60 developer days) to quantify how often we used search engines for programming related queries.

(Note: We were originally intending to make this into a full blown study but that project fell through. If anybody is interested in following up on this research let me know.)

table1

To keep things simple and to protect the privacy of the developers, we only looked at history entries that referenced a search engine, and each developer was allowed to review and edit their browser history beforehand.

table3

table2

What we found showed just how prevalent the SAP problem had become on my team. Almost 1000 total programming related searches, and several hundred searches per developer in only 20 days. Averaging this out we see that individual developers were executing 16 programming related searches per day (at least 2 an hour) while the team was executing almost 50.

To try and quantify what this meant in terms of developer time, I used a (finger in the air, we can argue in the comments) estimate of 5 minutes per query to approximate the number of developer hours per day spent searching. This is where things start to get interesting. Based on this small study of a small number of developers over a relatively short period of time, we get an estimate of over 1.2 hours per day per developer spent searching on the web, or over 4 hours for the 3 developers per day.

table4

table5

Every time I look at these figures I am shocked and think 5 minutes per query is surely too much, but then I think about all the times that I have spent much longer than 5 minutes debugging some nasty problem with Google and Stack Overflow and think it could be much worse. Again I have no empirical data for the 5 minutes finger in the air estimate (if you have a paper or are interested on working on it let me know), but if anything it feels a little on the conservative side.

While we can debate the impact of search in terms of developer hours per day, the raw number of queries executed marks web search (and the developer communities like Stack Overflow to which it enables access) as a very important part of a software developer’s toolkit. In fact we can probably pretty safely state that developer communities like Stack Overflow, indexed and made available through powerful search engines, have become anticipated/required parts of the software development landscape, depended on not just by programmers but by software development companies and vendors alike. In the next post we try to estimate the numbers of programmers that are SAPs. (I’ll stop making that joke soon, I promise.)

(For a more in depth study of programmer browser histories, Chris Parnin has since done a much more comprehensive study of the role of web resources in the practices of android developers: http://blog.ninlabs.com/2013/03/api-documentation/)

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs (Coming Soon)
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

How Google Became Your Company’s Most Valuable Programmer (Part 1)

This is a series of posts about how software development and programmers have become increasingly reliant on developer communities like Stack Overflow and search engines to help them develop and ship code. We think this is a fundamental shift in the nature of software development that has wide ranging implications, but first a little story.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)

A Story
One day about 2 years ago while working at my previous company, in the middle of a heavy development cycle, I and my entire development team lost our internet connection for about a day. That morning as a project manager getting into work and finding no external network connection I quickly did a run-through of my mental checklist of “Things We Need to Ship Code”©

  1. Are my source code repositories locally hosted (Check)
  2. Bug and task tracking database locally hosted (Check)
  3. Test & Development environment on local network (Check)
  4. No external dependencies on license servers or other stupid stuff (Check)
  5. Production servers hosted offsite with redundant net connections (Check)

I began to congratulate myself, my decision to resist the lure of hosting my code repos and bug tracking database online and to eat the maintenance cost of self-hosting had finally paid off. Here was the perfect argument for self-hosting, the network goes down but you can still be productive and write code as if nothing had happened.
But as with all good stories there is a twist. This isn’t the story about the fragility of networks or the dangers of trusting mission critical resources to external service providers. No, this is the story of how I learned that my mental checklist of “Things We Need to Ship Code”© was missing something very important, something without which myself and my programmers could no longer effectively do our jobs.

Google.

But I am getting ahead of myself, back to the story. After congratulating myself on my cleverness and talking with the team to confirm my cleverness and devise a plan for the day we sat down, pulled up our issues on the bug tracker and set to work. Everything seemed to be going along fine, I actually thought the lack of email etc for the day might be a bit of a productivity boost. But as the day progressed I started to notice that I felt… slower…. everything was just that little bit more difficult and bugs or features which I knew should have only taken a few minutes, were taking much longer than I anticipated. The reason quickly became clear, every time I ran into an unfamiliar error or needed to look up the name of an infrequently used class or method, my first instinct was to reach for my friendly search engine. But there was no internet and no search engine and I was less productive as a result.

Things I knew I could solve in seconds with a quick query were taking what seemed like forever. I had to use a debugger to debug stuff that I knew millions of other developers had already debugged, found the solution and posted to the web. It felt like I was coding with a blindfold, wasting time, reinventing or re-debugging things which I wouldn’t have had to if I could just run a quick search.

It was when I pulled out my phone to search for an exception on a miserable GPRS connection that I knew I had a problem, I had contracted SAP – Search Addicted Programmer. I was an addict and I knew it, but looking around I could see that same frustration and craving etched on the faces of my fellow addicts.

We were all SAPs :) (I’m sorry I couldn’t resist that one)

In my next post, I attempt to quantify how addicted to search I and my team had become.

Part 1 – A Story
Part 2 – Diagnosing SAP
Part 3 – How Many Programmers are SAPs (Coming Soon)
Part 4 – Causes & Effects of SAP (Coming Soon)
Part 5 – Refactoring the Programmer (Coming Soon)