R Project Sprint.

An in-person collaborative event to contribute to base R.

Shannon Pileggi
2023-09-07
38 participants in-person in a classroom space: 1st row sitting on floor, 2nd row kneeling or sitting in chairs, 3rd row standing. Two remote participants shown on video conference in screen to the left of the group.

Figure 1: R Project Sprint participants, with remote participants shown on video conference. Not all particpants photographed.

TL; DR

In August of 2023 I traveled to the University of Warwick in Coventry, England to participate in the R Project Sprint, where I worked collaboratively on contributions to base R alongside novice and experienced contributors, as well as the R Core Team.

About

The source code for base R is a mixture of R and C code. The bug fixes, maintenance, and enhancement of this code base is upheld by volunteers. Those who make the executive decisions are the R Core Team; however, community members are still encouraged to contribute.

The aim of the R Project Sprint as I perceive it was to foster mentorship, collaborations, and personal relationships between the novice and experienced contributors, reducing the barrier to one’s first contribution.

The purpose of this post is describe my personal experience at this event; a comprehensive guide for how contribute to base R is documented on the R Contribution Working Group website.

Applying to participate

Applications to participate in the event were due in March 2023, and later that month I was notified of my acceptance. The expectations for participants’ knowledge base coming into the event was nicely outlined in the application.

Prior to the event

Leading up to the event there was communication about optional things we could do to prepare, most of which is available at the R Contribution Working Group website. This included:

Participants

The R Project Sprint participants were impressively diverse both in terms of professional experience and geographic residence. A non-exhaustive list of countries represented at the in-person event includes Argentina, Brazil, Canada, England, India, Hungary, Nepal, Netherlands, New Zealand, Nigeria, Oman, Senegal, Switzerland, and United States.

Some individuals participated remotely via video conference, either by choice or circumstance. The combination of the United Kingdom’s air traffic control issues and train strikes caused delayed arrival for some.

During the event

The issues addressed at the event fell into four broad categories:

  1. Translation of messages, errors, and warnings.

  2. Improving documentation.

  3. Addressing bug reports, such as those related to the stats package.

  4. Enhancements, such as logging visual outcomes from base graphics calls.

Knowledge requirements for contributions

Depending on the task at hand, the technical requirements for contributions included:

Workflows for contributions

During the sprint, the progress on these items was tracked at https://github.com/r-devel/r-project-sprint-2023/issues. Either the initial report or the final resolution of these items may be seen in the R Bug Tracking System, known as Bugzilla.

Depending on the nature of the proposed change, the contributor may need to build R from source. The instructions regarding how to do so on your personal computing enviroment are outlined in the R Patched and Development Versions Chapter of the R Development Guide. Alternatively, one could also build R from source without touching their local computing environment using the GitHub Code Space R Dev Container (which I heard facilitated collaboration nicely as well).

Regardless of whether you needed to build R from source, proposed changes could then be tested on multiple computing platforms via a pull request to https://github.com/r-devel/r-svn, which mirrors the official base R server. After tests have passed, the contributor could extract the diff from the pull request to submit a proposed solution via Bugzilla; this process is more thoroughly documented in the Using a git mirror section of the R Development Guide.

Once a proposal is submitted, member(s) of the R core team review. If the proposed change modifies the code base, additional checks are run against all packages on CRAN (which there are currently ~20,000) to determine if any breaking changes are enacted, which takes >14 hours to complete. If breaking changes are found, a possible resolution could include re-writing the code to avoid breaking changes. If that is not possible, the R Core Team would then assess if the breaking changes are good breaking changes for packages (i.e., code could have been returning possibly incorrect results and should indeed break and be addressed) or if the breaking changes have too wide of a reach and minimal impact to be considered worth it. If it is decided that breaking changes should be enacted, the R Core Team notifies all authors of affected packages.

Bugs personally addressed

I spent most of my time at the Sprint on two bugs that genuinely intrigued me as there were related to functions I had used often over the years.

  1. base::paste documentation, discussed in bug 17933 and tested in r-devel PR138. Opened in 2020, there was already substantial nuanced discussion on this issue related to the clarity and correctness of the documentation regarding the collapse and recycle0 behaviors. It took me a substantial amount of time to understand and evaluate the scenarios discussed and propose changes. After my initial proposal, I received several rounds of feedback both from fellow R contributors and the R core team that provided additional suggestions and context for the documentation.

  2. stats::t.test bug, discussed in bug 14359 and tested in r-devel PR142. Opened in 2010, there was again already substantial discussion on the both the implementation of the paired t-test and the examples shown in the documentation. This was again a collaboration among several individuals to both modify the source code and improve documentation that had several rounds of iteration.

Outcomes

For both me personally and for the wider R community, I view the event as a huge success. Many friendships were made, many collaborations were born, and many bug fixes and enhancements were implemented throughout the week.

Time zone differences and personal availability among the R Core Team and contributors can lead to time lags in communications, losing momentum for initiated issues. Having contributors in person together facilitated live and immediate feedback, allowing for faster iteration and completion.

I can now confidently either submit a new bug report or address an existing bug report. Moreover, I understand workflows for contributions, resources for help, and how to interact with the community for help should I get stuck.

I will also begin attending the R Contribution Working Group (RCWG) as a representative of R-Ladies to communicate RCWG highlights to the broader R-Ladies community and engage in RCWG initiatives as able.

I learned from R Core Team members that have been contributing to base R since 1997, and who knows, maybe I just shared a dinner, chatted over coffee, or walked around campus with a future R Core Team member. 💙

Acknowledgments

Infinite thanks to Heather Turner for organizing the Sprint, as well as the sponsors who provided the funding to make this event possible. Thank you to R Core Team members and fellow R contributors who traveled across the world to attend, and who kindly and generously shared their knowledge, expertise, anecdotes, and experiences. It was truly a pleasure. Lastly, thanks to Hannah Frick (a fellow Sprint attendee) for providing feedback on this post.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Pileggi (2023, Sept. 7). PIPING HOT DATA: R Project Sprint.. Retrieved from https://www.pipinghotdata.com/posts/2023-09-07-r-project-sprint/

BibTeX citation

@misc{pileggi2023r,
  author = {Pileggi, Shannon},
  title = {PIPING HOT DATA: R Project Sprint.},
  url = {https://www.pipinghotdata.com/posts/2023-09-07-r-project-sprint/},
  year = {2023}
}