Yuvaraj (Yuvi) | 2i2c

Announcing our public roadmap for open development

Fri, 06 Feb 2026 00:00:00 +0000

At the core of 2i2c’s service is a commitment to doing our work in a way that follows open principles and practices. We commit to doing all of our work in the open and only managing and developing open infrastructure. As part of this effort, we’re shifting our strategy to lean more heavily into co-creation with member communities and open source communities.

Today we’re excited to share a first step towards making our development process more participatory, transparent, and useful: we’re opening up our initiatives roadmap. You can find it here:

👉 2i2c.org/roadmap

You can find all of our roadmap initiatives in this GitHub repository (please comment and engage with us there!):

👉 github.com/2i2c-org/initiatives

Moving forward, the roadmap will be a key part of our service to member organizations.

Why we’re opening up our roadmap #

At 2i2c, initiatives drive most of our work. They represent major chunks of value with multiple steps needed to implement and unlock it. They range from making core infrastructure improvements to our member network, to making upstream contributions that enable new functionality on behalf of our member communities. While initiatives are generally public, they are spread across many places, and we’ve managed their prioritization, sequencing, and refinement in internal team spaces. This made it difficult for others to follow along, signal-boost, and potentially support initiatives they wanted to see done.

For this reason, we decided to build a public view of our initiatives roadmap. This reflects our current team priorities and what is coming down the pipeline.

We’ve also put our platform initiatives in a public repository. This gives us a public space for member communities (or anybody else) to discuss, collaborate, and support ideas in the open.

By opening this up, we hope to accomplish these goals:

Make it easier for everyone to see and influence our priorities.
Make it easier for member organizations to fund or collaborate on work.
Make it easier for open source communities to see what’s driving our contributions.
Make it easier to give credit to member communities that fund work.

In short, we want to model what sustainable open development can look like. Our hope is that this will both create more transparency and trust with our stakeholder communities, and invite them to tell us how we can best use our team capacity.

We hope to use a shared roadmap to funnel more resources into open source #

We also hope we can leverage this as a shared roadmap across our member communities that helps focus our attention and drive fundraising for our work. To begin, we’re inviting any member community to provide financial support for these items as a way to influence our timelines and priorities, and we’re exploring ways to facilitate fractional co-funding across our member communities to help share the cost of development across many organizations.

This is an early experiment in collaborative and radically transparent development, and we will iterate and learn as we get feedback from member communities. We’re excited to see where this goes!

Please give feedback on this idea. If you’re interested in being a member organization of 2i2c, reach out to us about membership.

Fixing the mybinder.org usage analytics archive

Tue, 14 Oct 2025 00:00:00 +0000

The analytics archive at archive.analytics.mybinder.org powers the mybinder.org usage dashboards and provides a daily-published dataset that researchers and communities use to understand how Binder is being used across different domains and scientific communities.

While updating our quarterly Binder impact report, we discovered the archive index page had stopped updating. The analytics publisher was writing index files to temporary storage before uploading to Google Cloud Storage, but for some reason the upload step stopped working. We deployed a fix that eliminates the temporary files entirely - the code now generates the HTML index as a string in memory and uploads directly.

The mybinder.org analytics archive shows a list of daily usage reports that anybody can download.

Fortunately, we didn’t lose any data! Thanks to some smart design decisions, the daily analytics files were being collected properly the entire time, only the index page listing them was broken. You can find the full archive here.

Learn more #

Pull request with the fix
mybinder.org usage dashboards
The binder-data/ repository is where we aggregate and publish archive data to be more accessible.
Our quarterly impact report from mybinder.org

Acknowledgements #

Thanks to the JupyterHub community for their collaboration on mybinder.org infrastructure

Combating tcp scanning on mybinder.org with the tcpflowkiller

Wed, 08 Oct 2025 00:00:00 +0000

We’ve deployed a new tool to mybinder.org that automatically detects and stops port scanning activity, helping us maintain service reliability while being responsible citizens of the internet.

Port scanning is a common part of network-based exploits, and many server hosts prohibit this activity (including Hetzner, where the 2i2c mybinder.org infrastructure lives). We developed a little tool called tcpflowkiller as part of the cryptnono project (our anti-abuse set of tools for hosted JupyterHub and Binder infrastructure) to automatically kill processes that exhibit port scanning behavior. This reduces the likelihood of triggering our server host’s abuse policies and helps keep mybinder.org running reliably.

Why this matters #

As providers of public compute, it’s our responsibility to make sure people can’t use our infrastructure to abuse others. This is part of being responsible citizens of the internet. It also saves us time in dealing with outages because cloud providers (understandably) block access when they suspect there is abuse.

Hetzner and similar hosts have many benefits (including significant cost savings), and tools like tcpflowkiller help keep hubs and binders running smoothly on such hosts, which have different abuse policies than the big commercial cloud providers.

AWS and other cloud providers have proprietary ways to combat abuse (like AWS GuardDuty). We could have spent our time investing in developing rules there. Instead, contributing to cryptnono helps provide the same set of features in a cloud-agnostic way, in line with our principles of supporting open infrastructure that gives communities control over their infrastructure.

This tool has now been deployed to mybinder.org, and we’ll monitor its effectiveness over time. We may roll this out to 2i2c public BinderHubs in the future based on patterns we observe.

Learn more #

Acknowledgements #

Thanks to GESIS for their continued support of mybinder.org and to Raniere Silva for collaborating on this deployment with us.
More reliable Binder infrastructure is also supported by NASA Open Science / Science Core, whose tutorials run on the opensci binders that depend on this same anti-abuse stack.

From scattered effort to strategic impact: How we're systematizing our Foundational open source contributions

Fri, 26 Sep 2025 00:00:00 +0000

Over the past year we’ve experimented with being more strategic about supporting upstream communities as a team. This post summarizes our current plan, including team targets and practices we’ll continue to pilot. We’ll revisit this as we learn more.

Note: This document is about the Foundational contributions we make so that open source communities are healthier and more impactful. It is not about Directed upstream contributions we make as part of our own product work. See On being a good open source citizen: supporting a healthy ecosystem through directed and foundational contributions.

The challenge: Why scattered individual efforts aren’t enough #

Healthy open source communities rely on both individual and institutional contributions. 2i2c aims to be an excellent “upstream citizen”, so we need a structured approach with clear goals and rationale for why it’s the best use of our team’s time.

Without a coordinated approach, we risk two problematic outcomes:

Best case: Scattered, individual efforts that are subject to the Tyranny of Structurelessness. We help at the margins but not meaningfully.

Worst case: Our organizational capacity inadvertently dominates communities, making 2i2c the sole stakeholder capable of meaningful development and maintenance. We functionally take over the project.

By setting explicit goals, both our member communities and upstream projects can hold us accountable for actions that strengthen rather than undermine community health.

Our long-term goal: Multi-stakeholder, resilient communities #

With this in mind, we’ve chosen the following outcomes as our major goals for upstream contribution:

We want the Jupyter¹ community to be a multi-stakeholder², diverse³ community with a very high bus factor, because we believe this is a critical pre-requisite for advancing our mission and value proposition.

We want to build team processes that help upstream communities make progress towards this goal, so everyone can equitably participate with the support they need.

Two key objectives #

Starting with JupyterHub, we’ve identified two objectives that will guide our work:

Objective 1: Increase the number of casual but returning contributors to the JupyterHub community

Objective 2: Increase the number of total maintainers in the JupyterHub community

We’ve chosen these objectives because (1) they have impact, (2) we can make meaningful progress on them, and (3) we can integrate this work into our team’s workflow.

For each activity below, we’ve brainstormed some Key Performance Indicators (KPIs) to track progress and ensure we’re learning effectively.

Four pilot activities #

We’ll experiment with these four activities⁴:

Review pull requests from non-maintainers
Issue Triage office hours
Sponsoring and Mentoring new Maintainers
Increase bus factor and diversity of people making releases

Review Pull Requests from non-maintainers #

Imagine two different scenarios:

You casually contribute a PR to some OSS project. Someone responds the next day, you have a pleasant back and forth, and it gets merged (or rejected) within a few days.
You casually contribute a PR to some OSS project. Nobody responds for a year. Eventually someone leaves a comment. You have forgotten everything, and don’t even respond. Much later, your PR gets closed as stale.

Which experience will encourage you to come back and contribute again?

It’s clearly (1). We should use our institutional capacity to bring the community closer to (1).

We’ll accomplish this by including the following work item in every sprint:

Review of N PRs by non-maintainers of JupyterHub

We will build skills (via pairing, training, etc) inside 2i2c, as not everyone will feel comfortable reviewing pull requests for all projects, nor have rights to merge or close PRs. We may also do additional work like new contributor drives, better documentation, and policy advocacy. We will include pull requests of all types, not just code contributions.

KPIs #

We imagine two KPIs for this activity:

Number of PRs merged (or closed) through our sprint planning activity.
Number of returning contributors whose PRs were reviewed by us.

Issue Triage office hours #

Issue Triage involves combing through an upstream repository’s issue tracker, engaging with new issues, refining them to be actionable, and signal boosting important ones for team action. This is hard for newcomers, as it often requires deep knowledge of various components to understand how to direct an issue or refine it. It’s also challenging for team members still learning open source community dynamics. We’d like to upskill our team members within 2i2c and our upstream open source communities.

As part of our sprints, we will run regular “Issue Triage” office hours. We’ll begin by upskilling our own 2i2c team members in effective issue triaging. We’ll then explore opening issue triage sessions to the broader upstream community.

KPIs #

Number of issues triaged by 2i2c team members.⁵

Sponsoring and Mentoring new Maintainers #

OSS communities must grow their contributors into maintainers, or they will die.

XKCD comic about dependency

Growing new maintainers takes time and effort from both the potential maintainer and existing maintainers who mentor and sponsor them. The focus on sponsorship is important, as laid out by Lara Hogan. This work takes years, not months, to manifest.

We will build structures to identify potential maintainers and create pathways for them to gain maintainership status. As JupyterHub lacks an explicit maintainer pathway, we will build our own process via these focus areas:

Identifying potential candidates for maintainership
Identifying potential community work they can do to help get involved (contributing bug fixes, code reviewing, issue triage, helping answer questions, contributing code / documentation, release management, etc)
Build pathways for candidates to do (2) as appropriate.
Iteratively continue until candidates have done ’enough’ work to gain maintainership status.

This work is nebulous but worthwhile. We will coordinate this effort closely with community leaders, recognizing it takes time to actualize.

In the Jupyter community, maintainership status is tied to individuals, not to organizations they work for. Nobody should get maintainership status simply because they work for a specific organization (such as 2i2c). We should look for diverse candidates, ideally funded by different organizations, who are interested in becoming maintainers.

Note: We’d also like to start with individuals in our collaborator network. For example, we’re using an engagement between NASA VEDA and Development Seed to onboard several team members into these projects.

KPI #

This measurement moves slowly, but is very clearly impactful:

Number of people who have become maintainers due to our concerted efforts.

Increase bus factor and diversity of people making releases #

Making releases is often thankless but important to community health. It involves coordinating testing, writing changelogs, and providing upgrade instructions. Institutions can help by dedicating team time to perform this task regularly. To advance the ‘multi-stakeholder’ and ‘high bus factor’ aspects of our goal, we will have many different people do releases, via mentorship and sponsorship. This will integrate into our regular workstreams.

KPIs #

Number of releases performed by 2i2c engineers
Number of releases performed by others with sponsorship / mentorship from 2i2c engineers

Criteria for upstream projects to support #

Our long-term goal applies to upstream communities that:

We strategically depend on to serve our member communities as part of our community hub service
We need to help sustain, given upstream community dynamics
We have the ability to help sustain

For example, Kubernetes satisfies (1) but not (2) or (3), while JupyterLab meets (1) and (2) but not (3) (presently). Currently this policy only applies to JupyterHub, but may change as our organization evolves.

How we’ll implement this #

Who is responsible #

Implementation is the responsibility of 2i2c’s Product & Services team. These activities must integrate into the team’s daily practices, not become an external shadow process for some members.

How we’ll fund this work #

Foundational upstream support requires significant work and expertise. We plan to fund this through:

Fees from our member communities. A percentage of our membership fees includes covering the cost of Foundational contributions like this.
Targeted contributions from some of our collaborators. Some collaborators have funds and want to support open source at a foundational level, in some cases we use funds from these collaborators to cover our costs.

We still need to explore what these efforts cost and mechanisms to recover those costs.

Next step: Learning in public #

We’re excited to experiment with more effective upstream contribution and eager to learn. We’ll share our experiences so others can learn from and comment on our process.

Acknowledgements #

@MinRK and - @bsipocz for helping review a draft of this!
@choldgraf for feedback, guidance, and editing for this post and the team practices in it.
JupyterHub, JupyterBook, and Project Jupyter for teaching us a lot about open source over the years.

Currently this is particularly JupyterHub and Jupyter-wide leadership. We’re exploring how to incorporate JupyterBook into our service and are thus investing Foundation contributions there as well. ↩︎
With different kinds and sizes of organizations (companies, non-profits, universities, etc) and individuals being stakeholders. We want to avoid a single organization monopolizing power within any community. ↩︎
Across the power spectrum - from users to bug reporters to casual contributors to maintainers to people on governance duty ↩︎
Implementation note: We will not start doing all these immediately! We will consult with the rest of the team, and start these 1 at a time so we can build these processes sustainably and equitably. ↩︎
This requires a definition of “an issue that has been triaged”, and to our knowledge no such definition exists. We’d like to learn how to measure something abstract like “issue triage” - perhaps it is something specific putting it on a board for further action or applying a label, or something more abstract like “increasing how clear and actionable the issue is”. We’ll explore this when we start to make progress towards this objective. ↩︎

On being a good open source citizen: supporting a healthy ecosystem through directed and foundational contributions

Wed, 03 Sep 2025 00:00:00 +0000

Any organization building on open source faces a fundamental tension: how do you serve the needs of your organizational stakeholders while also acting as a responsible steward of the upstream projects you depend on? This is harder than it looks - simply “making PRs” leaves a number of open source needs unaddressed, and can burn out both your team members and the open source maintainers. We think about this a lot at 2i2c, and want to share our framework to navigate this challenge intentionally.

Here are a few questions we’ve been grappling with:

How do we tie general upstream maintenance to value delivered to our user communities?
How can we scope upstream support so that it doesn’t detract from our service needs and product strategy?
How can we encourage team members to work on the most impactful aspects of upstream support?
How can we intentionally and equitably support open source communities as a team, rather than a collection of individuals?

Along the way, we realized there are two very different kinds of upstream contributions:

Directed Contributions: A contribution driven by the needs of our member communities and product roadmap. We call these “Directed” contributions because they address a targeted need driven by one stakeholder (us!).
Foundational contributions: A contribution driven by the needs of the upstream community. We call these “Foundational contributions” because they’re meant to provide the healthy foundation on which a community can operate and grow.

Historically we have conflated these types of contributions, but we think it’s key that we treat them differently.

Note: For a more practical guide that describes the systems we’ve set up to accomplish Foundational upstream contributions, see From scattered effort to strategic impact: How we’re systematizing our Foundational open source contributions.

Everybody has an open source hat and a stakeholder hat #

Open source teams¹ are usually two kinds of teams that overlap heavily:

A collection of stakeholders working together on the open source project, each with their own goals and interests.
An open source team with a shared goal and strategy for the open source project.

In this case, stakeholders can be individuals or companies. They use and contribute to the open source project because it advances their own interests. For example, an enthusiast contributing to a project because it brings them joy, or a company contributing to a project because they build a product that depends on the open source technology.

However, for open source projects to be successful they also need their own unique identity, goals, strategy for impact, and system of work. This allows a diverse collection of stakeholders to work together effectively and create impactful technology. This team is made up of the same stakeholders described above, but with a responsibility to lead and support the open source team, rather than just serve their individual interests as stakeholders.

Thus, any open source stakeholder has two hats: they are both representatives of a stakeholder and members of an open source team. While it’s possible to align the interests of these two groups, we think it’s still important to distinguish between them.

Directed Contributions benefit the stakeholder you represent #

A Directed Contribution is primarily driven by the needs of a stakeholder in an open source project. To use 2i2c as an example, let’s take a quote from 2i2c’s value proposition:

2i2c serves a global network of community hubs for interactive learning and discovery

Community here does not refer to open source upstream software provider communities (like JupyterHub or Kubernetes), but instead to downstream user communities (like CryoCloud, Openscapes, or NASA VEDA).

When 2i2c makes a Directed Contribution, it means we are trying to deliver value to one or more of our member communities by making an upstream contribution.

Satisfying community needs often involves directly working on the software they use. Driven by our right to replicate principles, this means we mostly work on software that is not proprietary to 2i2c nor solely owned by us permanently - but by contributing to an upstream software community. These are all Directed Contributions.

Some illustrative examples:

Allow login to be gated on OAuth2 granted scopes was a feature we added to support one of our communities’ auth flow ( EarthScope)
Changing how .pyc files are kept in images was work we did as a result of a support ticket investigating spawn timeout issues in the LEAP hub.
Adding landing pages functionality to Jupyter Book and MyST was work we did to support member communities like CryoCloud and Project Pythia.

The fact that these are open source contributions is incidental. We are primarily doing this work to deliver value to our community network.

We plan Directed Contributions according to our roadmap and member feedback #

Directed Contributions naturally align with 2i2c’s overall goals and strategy, so we use our product processes for planning and delivering on them. However, we also want to provide transparency to upstream communities so that they understand who is driving the contributions that we’re making.

With that in mind, here are a few ways that Directed Contributions relate to our practices:

Directed Contributions should be defined by our product roadmap and prioritization processes.
We allocate engineering time for these upstream contributions as part of our product lifecycle, including the extra coordination and communication work needed to work at the pace of the upstream community.
We cross-link 2i2c product initiatives to upstream issues and pull-requests wherever we can to provide transparency about why we’re making a contribution.
We communicate this work via our blog so that 2i2c’s member communities know about the contributions we’ve made on their behalf.

Foundational Contributions support a healthy open source community #

However, contributions can’t always be driven by a stakeholder’s needs or the open source team will not have an identity or support structure of its own. Here’s another excerpt from our value proposition:

We need infrastructure services that are driven by community needs and values, that follow the same open source science practices we wish to see in others, and that believe in the power of shared community resources and knowledge.

Being a “healthy upstream citizen” is core to 2i2c’s mission, and is also a way to help communities we rely on remain healthy. Some of our contributions should be Foundational rather than Directed. This means doing things that keep the overall ecosystem healthy even if it does not directly address a specific member community need. The presence of a healthy open source ecosystem is a value to our member communities in-and-of itself.

Defining “Foundational” needs is difficult, because open source teams tend to have less structure and formally-stated goals and needs than most organizations. In 2i2c’s case, we focus our Foundational Contributions around maintaining the health of the open source ecosystem.

It includes things like:

Grow and guide new contributors to grow team capacity
Help making releases
Provide code review
Fix broken CI
Write documentation and tutorials
Manage and run meetings
Align open source teams on goals and strategy

However, the real point is that these actions need to be driven by the upstream project’s goals and needs, not by 2i2c’s needs.

Here are a few common examples of contributions that are not considered Foundational for our team:

Opening a PR to add a major feature to an upstream project.
Creating a brand new project in an open source organization in order to scratch your own itch.
Engaging in reactive open-source work that isn’t driven by a clear strategy or goal (e.g., randomly responding to the last few GitHub issue comments you happened to notice)

We plan Foundational Contributions alongside our engineering roadmap #

Foundational Contributions are important to 2i2c both for strategic and tactical reasons. However, when left as unstructured time (as we have historically), it runs into all the problems of unstructured work - it happens in non-strategic ways, it isn’t evenly balanced across team members, it is more or less accessible depending on your personal comfort level and skills, etc.

With that in mind, here are a few ways that Foundational Contributions relate to our practices:

We need to own Foundational Contributions as a team, rather than asking individuals to identify and do this work on their own.
We need to define team goals and strategy to define the impact we want to have, and what kind of work leads to that impact.
We need a team system for identifying and prioritizing the most impactful Foundational Contributions to perform.
This system must spread the responsibility of Foundational Contributions across our whole product team.
It means we need to give people support and training to do this effectively. For example, helping team members grow into roles that involve upstream work, rotating certain types of contributions across team members, etc.

To ensure this work is intentional and equitable across our team, we encourage Foundational contributions to happen within this framework. Contributions that falls outside of it is treated as a valued, but separate, personal contribution.

What’s next #

By distinguishing between Directed and Foundational contributions, we can align and balance our immediate product needs with our long-term commitment to community health. We believe this framework allows organizations like ours to be better partners. We’d love feedback about this process, how we can improve it, and what others have learned along the way.

By “open source” we are focusing on multi-stakeholder open source projects with participatory and inclusive leadership and contributions. This wouldn’t apply to an organization- or person-specific open source project. ↩︎

Sharing JupyterHub's vision for more flexible application deployment at the doepy talk series.

Wed, 03 Sep 2025 00:00:00 +0000

Our Technical Lead Yuvi Panda recently gave a talk at the doepy meetup about JupyterHub’s interest in moving beyond the “single-user notebook application” and towards a more flexible approach to enabling administrators to deploy many different types of applications and environments.

Check out a video of the talk here:

This is an important step for the JupyterHub project in order to support the many different kinds of workflows that data scientists need to use in their work. We hope that this generates more interest in the JupyterHub project and gives us useful feedback to guide the team’s understanding of this direction.

Learn more #

Acknowledgements #

The doepy team for inviting Yuvi to give this talk.
The JupyterHub team for working with us on this strategy.
2i2c’s network of member communities whose fees support our Foundational open source engagement.

Solving classes of problems, rather than just an instance of a problem (with an example)

Mon, 09 Jun 2025 00:00:00 +0000

The Problem #

Two of our the communities we serve ( NMFS Openscapes and CryoCloud) reported issues with starting GPU nodes on their hubs. Upon investigation, I discovered that the cluster autoscaler seems to not recognize that GPUs were available in the cluster at all suddenly, and hence wasn’t provisioning the nodes. A restart of the cluster-autoscaler pod fixed the issue for both these communities.

An incomplete solution #

But is that the end of the story? Not if we want to provide reliable long term infrastructure to communities with minimal toil on the part of 2i2c engineers!

One of the engineering principles I’m trying to have us more intentionally and structurally embody is the idea that we don’t fix individual instances of problems, but whole classes of problems, rather than just an individual instance of the problem. Fixing the immediate issue is not enough - we need to understand what class of issues was manifesting itself in this particular fashion, and fix that.

What was the class of issues we could fix here? #

Digging in, I realized that our version of cluster-autoscaler was a little behind and not the latest. I presumed this was a bug in cluster-autoscaler (given a restart fixed it, implying it is a bug about state). To me, the class of problem here is that we were not rolling out releases to our “supporting infrastructure” fast enough. Perhaps if we were on the most recent cluster-autoscaler release, this issue would have never happened.

Additionally, this failure to scale up was reported to us by the community rather than by an automated alert. We should change that too!

Structured solutions #

We follow a two week sprint cycle, and I love the (hard won) structure it provides us. I don’t want to arbitrarily start doing work that upsets prior committed work from that structure. However, we also treat support requests seriously and try to work them into the sprint. So I timeboxed myself for one hour, and saw what I could accomplish. Turns out, a lot!

I upgraded all our support components, tested them, and rolled them out to all our communities! This included upgrading Grafana, Prometheus, nginx-ingress as well as the cluster-autoscaler. This also restarts the cluster-autoscaler across our clusters, fixing this issue for other communities (if any had it).
I re-enabled the automatic once a month PR for upgrading these support tasks. We had switched to doing them on a manual sprint cadence, but clearly that was not fast enough nor automated enough. We will instead work these into the sprint once the bot opens the PR. Credit to Erik Sundell for initially setting this up
Create an issue to track the alert creation, and put it in our sprint backlog.
(In an additional fifteen minute timebox) Write this blog post, to communicate out both to the affected communities and others what we have done.

By timeboxing myself, I didn’t upset our sprint cadence and was able to continue doing other work I had committed to in the sprint, while also fixing this class of issues to the best of my ability.

Moving forward #

While we have been implicitly trying to solve whole classes of issues rather than individual instances of an issue as a team for a while, I want us to explicitly do it from now on. Communicating this out to our communities is an important part of that, as is internal team training. This blog post is the former, and we are continually working on the latter :)

Acknowledgements #

Thanks to the OpenScapes and CryoCloud communities for working with us closely on infrastructure to identify improvements like this.

Simplifying and speeding up Binder builds with BuildKit

Mon, 03 Mar 2025 00:00:00 +0000

Chris and Yuvi recently wrote a blog post on the Jupyter blog about a recent experiment to significantly reduce the cost of running a node on the mybinder.org federation.

Acknowledgements #

Project Pythia and NASA Open Science / ScienceCore provide support for some of our work with the Binder project.
JupyterHub for working with us to get this new node deployed for mybinder.org.

2i2c joins the mybinder.org federation with a cheaper and faster way to deploy Binderhub

Wed, 29 Jan 2025 00:00:00 +0000

If you’re interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the

Support Binder page for how you can help.

tl;dr: The 2i2c team is joining the mybinder.org federation with a single-node BinderHub instance at 2i2c.mybinder.org. It should be much cheaper to run than auto-scaling Kubernetes clusters, and might be a good way to support mybinder.org more sustainably. For questions or comments, join this Jupyter Zulip thread.

mybinder.org is a massive public service for creating and sharing reproducible computational environments. It is managed by the JupyterHub team and members of the mybinder.org federation. One challenge in running mybinder.org is identifying cloud credits or financial resources to support the cloud infrastructure that runs the service. Two years ago, Google stopped supporting mybinder.org federation with cloud credits, and last month the federation lost more capacity, leaving only GESIS and OVH as remaining federation members¹. This makes mybinder.org less reliable, slower, and generally less useful to the world.

The landscape of cloud infrastructure technology and services has changed considerably, and we think that there’s a way to deploy BinderHub instances with lower costs and less complexity. We’ve accomplished this by deploying a single-node Kubernetes cluster on a VM provider that is much cheaper, now running at 2i2c.mybinder.org. This both relieves Binder’s short-term capacity shortage and may provide an easier pathway for others to support the project in the future.

Below, we’ll describe what has changed to enable this, what we’re deploying, and what the impact should be.

Cloud infrastructure has become cheaper and more commodified #

A key theory of mybinder.org (and 2i2c) is that commercial cloud infrastructure will be commidified over time – what begins as cutting-edge functionality will become commonplace and offered across all cloud providers. As a result, costs will go down over time. Abstractions like Kubernetes will allow you to easily migrate workflows and infrastructure between cloud providers. As a result, you’ll be able to easily follow those costs where there are better options. That’s essentially what is happening here.

There are two key changes that make it much easier to deploy a BinderHub instance at a fraction of the cost:

First, Kubernetes has matured and become easier to deploy. When mybinder.org started, it was using the cutting-edge of Kubernetes functionality. This meant that we needed to use cloud providers that provided a managed Kubernetes service to deal with this complexity. A managed Kubernetes offering tends to be expensive, offered by only a few cloud providers, and thus raises costs across-the-board for the provider that offers it.

However, this was almost a decade ago, and Kubernetes has become both more functional and more stable. There are now many more ways of running Kubernetes, especially for simpler workflows that don’t require autoscaling. In the last several months, we’ve been experimenting with single-node Kubernetes workflows via K3s². K3s is a lightweight Kubernetes distribution that is much easier to deploy and manage. It’s designed for things like edge computing and low-resource environments, and it can be deployed with a single script!

By running a Kubernetes cluster on a single node, we don’t need a “managed Kubernetes service”, which means we can choose from a much larger pool of infrastructure / cloud providers. If all we need is a running VM, this is something the tech industry has been doing for decades.

Second, Managed Object Storage services have more open source options, and are more commodified and cheaper. In addition to Kubernetes, the other thing that BinderHub needs is a way to store and retrieve images for the environments that it builds. This also used to be a fairly complex problem, and thus required managed solutions from cloud providers that charged a premium for their service. However, a number of open source object storage solutions have emerged and made it much easier for providers to support this workflow.³. Because these are open source, infrastructure providers can provide managed object storage at a fraction of the cost.

Because of these two things, we’ve learned that we can run a BinderHub instance on a single VM from a much larger pool of infrastructure providers. This means we should be able to run BinderHub instances at a fraction of the cost.⁴

Deploying BinderHub on a single-node VM is cheaper and simpler #

Last week, we deployed 2i2c.mybinder.org, a single-node Kubernetes instance on Hetzner cloud using K3s. This will run on a single node VM, with a Kubernetes instance that is entirely managed by us, and with managed object storage from Hetzner. Compared to other cloud providers, it is around 5x cheaper per month.

Comparison of rough monthly costs across different cloud providers for similar VM instances. These are rough estimates based on cloud provider pricing pages for an on-demand VM with around 190GB RAM. Pricing pages: Hetzner Cloud ~$300, Microsoft Azure ~$1,300, Google Cloud Platform ~$1,500, Amazon Web Services ~$1,600.

Running a single-node Kubernetes instance will be a cheap and effective way to handle a lot of mybinder.org’s capacity needs. Because it’s a single node cluster, there is no auto-scaling (one reason it is so cheap), which reduces a lot of the complexity we’ll have to manage. These are acceptable tradeoffs for a service like mybinder.org, which runs entirely ephemeral sessions with very limited resources and no promises about uptime, persistence, etc.

You might be wondering: “I thought Kubernetes was supposed to save money.” Normally, running Kubernetes for scalable workflows does save costs because you can scale infrastructure to match your capacity needs. Without scaling, you’d need to provide a VM that can always handle your maximum capacity needs (and pay for the costs the entire time). With Kubernetes, you can request and remove nodes to grow your capacity as-needed (and save money doing so). It looks something like this:

The cost difference between a single large VM vs scalable nodes. Given variable usage over time, kubernetes allows you to scale your cost up and down with need, which is more efficient than paying for a single VM that can withstand your maximum capacity.

However, there is a built-in cost you pay when you use a service that provides managed Kubernetes. Managed Kubernetes services are complex and expensive, and this is reflected across-the-board in the provider’s costs. What if we could achieve the same outcome with a much simpler cloud offering like a single VM?

We did a bit of research and discovered that the Kubernetes and object storage landscape has indeed evolved significantly since the early days of mybinder.org. For example, Hetzner is a cloud provider that has been around for a long time. It has single-node VMs that are about 4x cheaper than their counterparts in Google Cloud or AWS, and provides managed object storage that uses MinIO in a cost-effective way. Using K3s, we can run a lightweight, single-node Kubernetes runtime on this node, and deploy a BinderHub with the same infrastructure as any other BinderHub federation member.

By our estimate, we could fit around 400 simultaneous sessions on mybinder.org (because each session uses very few cloud resources). This is already the majority of mybinder.org’s capacity needs, and at a much lower cost than using a scalable Kubernetes cluster. The cost picture looks something like this:

If your single VM is much cheaper, it might still be the cheapest option. In the case of a Hetzner VM, it has roughly the same capacity as another cloud provider’s VM, but at 1/4 of the cost.

2i2c.mybinder.org now serves 70% of the mybinder.org federation #

About a week ago, we launched 2i2c.mybinder.org running via the methodology we described above. We intended to run this as a longer experiment, but believe that it has already proven useful enough to consider “ready for production”. We recently increased 2i2c.mybinder.org’s load to 70% and will continue to monitor its performance over time. Here’s a plot of where each mybinder.org session has been run over the past ten days - you can see the moment where we turn on 2i2c.mybinder.org to the left:

Sessions launched on mybinder.org’s federation over the past ten days. The yellow area represents sessions run on 2i2c.mybinder.org. They now make up the majority of launches on mybinder.org. Prior to this, gesis.mybinder.org was the only remaining federation member.

For now, 2i2c is sponsoring a max of €350 a month (with some currency conversion noise) to run this service. We’ll provide in-kind labor to run this node, and treat it as an organizational investment in supporting open science, as well as learning new Kubernetes and cloud infrastructure workflows. We’re going to use funds recovered from communities in our community hub network, along with in-kind labor to build out this experiment.

In six months, we’ll evaluate how much effort it was to run this node for mybinder.org, whether it meaningfully helped with mybinder.org’s capacity, and whether it was sustainable for us from a time and labor perspective.

Others can join the mybinder.org federation using this approach as well #

We think that developing this single-node BinderHub workflow will make it much easier for others to join the mybinder.org federation, because it lowers the infrastructure and skills complexity needed to join. Here is a brief guide we’ve written for deploying a BinderHub with K3s. We are helping a few interested organizations deploy their own BinderHubs in this way in order to validate the idea, and are hopeful that this makes it much easier to grow mybinder.org’s capacity via new federation members.⁵

We’re excited to experiment with new ways to support mybinder.org. We think this is an excellent example of how open standards and technology lead to cloud workflows with lower costs and more flexibility. We also think it’s a good example of how it is valuable to have organizations aligned with open science (like 2i2c!) acting in this space. If you have any questions or comments, please join this Jupyter Zulip thread

Anybody want to fund this? #

If you’re interested in making open science infrastructure like Binder more scalable and sustainable, we’d love to find more resources to both sustain this node and cover more development time to run this experiment. Feel free to reach out here.

If you have access to VMs and object storage, and are interested in running a mybinder.org federation member using the methods described here, check out our brief guide for deploying a BinderHub with K3s.

If you’re generally interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the Support Binder page for how you can help.

If you’re interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the

Support Binder page for how you can help.

Acknowledgements #

Thanks to the JupyterHub community for helping us set up this new node.
Thanks to our member communities whose fees currently support this work.

Many thanks to GESIS and OVH for their continued support of mybinder.org, your contributions to keeping this service running are critical! ↩︎
thanks to Carl Boettiger for collaborating on this with us! ↩︎
One example is MinIO, which is used by Hetzner to provide managed object storage for their single-node VMs. ↩︎
For example, Hetzner provides a single-VM option with managed object storage that is roughly 25% of the cost of other cloud providers that also offer autoscaling Kubernetes services. There are many other infrastructure providers who could be used in this way. ↩︎
We’re also experimenting with a few other ways to reduce the complexity and costs of running a BinderHub even further, but will have more on that later as we learn more :-). ↩︎

Announcing our formal commitment to open technology

Wed, 15 Jan 2025 00:00:00 +0000

In this post, we’re sharing our Commitment to Open Technology. It is focused on software licenses for reasons we’ll describe below. We hope that it clarifies what kind of licenses we’ll use, and assures our communities that we will not change our stance towards open source technology in the future. This ensures 2i2c’s long-term commitment to community-owned and open infrastructure.

Being a platform and service provider gives us a lot of power, and also introduces a potential source of lock-in for our member communities. While 2i2c’s organizational mission and culture are strongly aligned with open infrastructure, we believe it’s important to encode commitments like these in a formal way to provide both transparency and accountability to our member communities.

Our commitment to open technology #

Below we copy the original language of this policy from our Commitment to Open Technology:

Definitions of MUST, MUST NOT, SHOULD, MAY, etc are defined in RFC 2119

All engineering artifacts (code, documentation, etc) produced by 2i2c’s engineering team MUST be licensed under an open source license approved by a non-profit organization that is not 2i2c.
Open Source Projects originating at 2i2c, or stewarded by 2i2c, MUST NOT require a Contributor Licensing Agreement that includes Copyright Assignment to 2i2c.
The list of external organizations that define licenses we accept are
1. the Open Source Initiative
2. the Organization for Ethical Source.
Modifying (1), (2), or (3) MUST be done through a 2/3 majority vote of 2i2c staff.

What does this commitment mean? #

In plain language, here’s what this commitment means:

We’ll only use open source licenses that have been approved by standard non-profits that are broadly recognized by the tech industry.
For anything we build, we won’t require contributors to give up the rights to their contributions via CLAs, so that it is much harder for 2i2c to change our licenses in the future.
Changing this policy will require organization-wide agreement, and in the future we’ll give authority over this policy to a group of people representing our member communities.

Why are licenses and CLAs important? #

Many organizations claim to be committed to open infrastructure, while retaining the ability to change this commitment in the future when it is in their interests. A classic example of this is a “bait and switch” that looks something like this:

A company releases software under an open source license and professes to build an open source community around it.
However, they retain the rights to all of the code in their projects through a Contributor License Agreement (CLA) with copyright assignment. This generally means that contributors must give up the rights to their contribution in order to make that contribution.
Once their product has gained traction and it is in their interests, the company can change the license to whatever they wish (even one that is not open source) because they retain the rights to all contributions in the codebase.
They then leverage this new position as owners of a proprietary project to extract business value or grow their position in a market.

Think this sounds unlikely? Here are just a few recent examples of companies that have switched their license after many years of releasing their technology under an open source license:

We want to ensure our communities that 2i2c is not headed down this path, in order to give them confidence in treating us as a long-term service partner.

What does this change about 2i2c’s open source commitment? #

In short: nothing. These are already the principles that 2i2c was committed to from its inception, and already implied via our Right to Replicate. However, we wanted to make these commitments more formally in order to give ourselves more accountability to sticking with them, and to provide more transparency for our community members and stakeholders.

Who is this for? #

We imagine three audiences for this policy:

2i2c present and future staff who want to ensure that their organization remains committed to our open principles. This document provides a sense of psychological safety to have bold discussions about structuring our approach to open source.
Member communities and 2i2c stakeholders who need to have an understanding of the guarantees that we provide in order to trust 2i2c as a service developer and provider. This is similar to the effect our Right to Replicate has.
Open source communities who need to understand our long-term commitment and goals around open technology in order to trust as a peer and collaborator within open source communities.

We’d love feedback #

We hope that these ideas both clarify our intent and the reason that we think it’s important. We’d love feedback about early refinements to these principles in order to make them more effective, as well as ways that we can provide more community oversight and participation in evolving these policies moving forward. If you have any thoughts to share, please send feedback via e-mail hello@2i2c.org.

Acknowledgements: The creation of this policy and the rationale behind it was led by Yuvi Panda with feedback from 2i2c’s team. This blog post was co-written with Chris Holdgraf. Strategic work like this is supported by a grant from The Navigation Fund.

NASA VEDA & 2i2c Update for Q4 2024 (Oct-Dec 2024)

Tue, 07 Jan 2025 15:18:37 -0800

A non-exhaustive list of things 2i2c and Development Seed did with the NASA VEDA project last quarter!

Automated backups and alerting with `jupyterhub-home-nfs` #

Tracking Issue

jupyterhub-home-nfs is a young project to provide flexible per-user home directory limits on JupyterHub - an important feature for controlling cloud costs. Tarashish Mishra and Sarah Gibson have been leading this project for the last few months. Since we are moving away from AWS Managed EFS here, we had to do some work to recreate some of the benefits EFS gives us out of the box. During this quarter, we:

Set up automated backups so we can recover files in cases of disaster
Set up automated alerting (via prometheus and pagerduty) to know if our backing EBS device is getting full and we need to perform a manual intervention
Deployed this to a few other communities ( CryoCloud and NMFS Openscapes) to broaden adoption.

We will continue doing work on jupyterhub-home-nfs in the upcoming quarter! If this is functionality you are interested in deploying, please reach out to us to collaborate!

Enable users to dynamically build environments with `jupyterhub-fancy-profiles` #

Tracking Issue

We covered this more extensively in another blog post, so go read that!

This work in particular is a good demonstrator of 2i2c’s value - it started off with a grant from GESIS, and now with support from NASA IMPACT we are able to bring it to a lot of communities, not just the ones that funded it.

Ongoing work here will focus on improving the UX as well as better documentation so users can actually use it!

“Open in QGIS” from VEDA UI #

Tracking Issue

We had worked in the past with many communities in enabling QGIS on the Cloud, and this quarter we got closer to enabling a contextual ‘Open in QGIS’ button in the VEDA Dashboard! Here is a quick demo:

(This shows the workflow when user is already logged into the JupyterHub and had started the server)

You can play with this in this preview, although you need to have access to the NASA VEDA hub to fully try it out at this point.

Tarashish from Development Seed is again responsible for most of the work here, available in jupyter-remote-qgis-proxy. You can use it to create ‘magic links’ that will open QGIS in a desktop environment in your browser, and add a specific layer to it! Our hope is that this allows primarily GIS folks to better use tools they already are familiar with in cloud based contexts.

Other updates #

We participated heavily in an evaluation process for the authentication and authorization solution to be used across NASA VEDA! Tracking Issue
We are very close to rolling out JupyterHub 5.0 and associated changes across all our hubs, which will enable us to eventually offer per-group shared directories! Tracking Issue

Acknowledgements #

Thanks to the NASA VEDA project for thir ongoing support for this work.
Thanks to DevSeet for their collaboration and leadership on this project.

`frx-challenges`: A new tool to host data challenges for Frictionless Research Exchanges

Fri, 06 Dec 2024 00:00:00 +0000

2i2c is pleased to announce the frx-challenges project, a new open source tool to help communities host data challenges on shared infrastructure:

2i2c-org/frx-challenges

This project aims to make it easier for administrators to provide a service that enables users to submit code and data that are evaluated on secure infrastructure with access to private data and resources. It also provides a leaderboard that helps users compare their performance against others.

An example leaderboard for a data challenge, taken from the Cellmap Challenge. Users make submissions that are run against secure and private infrastructure and data, and provides feedback about the submission’s performance. Learn more about the FRX challenges project here: 2i2c.org/frx-challenges/

It is designed to be lightweight and flexible, and can be run on a variety of shared infrastructure. For those who wish to run this project on cloud infrastructure, we’ve also published a Helm Chart to help you deploy frx-challenges with Kubernetes.

While it can be run on its own, we believe that it naturally complements other tools and services for interactive computing and data, such as JupyterHub, Jupyter Book, and Binder. More on that below.

Below is a brief description of the motivation behind this project.

What are Frictionless Research Exchanges #

The project is heavily inspired by David Donoho’s vision of Frictionless Research Exchanges (FRX) as described in Data Science at the Singularity.

In this article, Donoho describes three key pillars for Frictionless Research Exchanges:

The three initiatives are related but separate; and all three have to come together, and in a particularly strong way, to provide the conditions for the new era. Here they are:

[FR-1: Data] datafication of everything, with a culture of research data sharing. One can now find datasets publicly available online on a bewildering variety of topics, from chest x-rays to cosmic microwave background measurements to uber routes to geospatial crop identifications.

[FR-2: Re-execution] research code sharing including the ability to exactly re-execute the same complete workflow by different researchers.

[FR-3: Challenges] adopting challenge problems as a new paradigm powering scientific research. The paradigm includes: a shared public dataset, a prescribed and quantified task performance metric, a set of enrolled competitors seeking to outperform each other on the task, and a public leaderboard. Thousands of such challenges with millions of entries have now taken place, across many fields.

We considered the landscape of tools and services, and felt that [FR-1] and [FR-2] were already well-served by a variety of tools and services for community workspace infrastructure (e.g., JupyterHub: jupyterhub.readthedocs.io), sharable computational environments (e.g., BinderHub: binderhub.readthedocs.io), authoring and reading computational narratives (e.g., Jupyter Book: jupyterbook.org and MyST: mystmd.org), and data I/O tools and standards (e.g., Zarr: zarr.readthedocs.io and Intake: intake.readthedocs.io).

However there was a natural missing piece for [FR-3 Challenges], and we could not identify any community-managed infrastructure that facilitated data challenges. This is the goal of frx-challenges.

Why facilitate data challenges? #

Data challenges are harder than you think! While it is simple enough to run somebody else’s code locally, data challenges require a systematic, secure, and automated approach to accepting and evaluating submissions in a fair and repeatable way. Here are some of the big challenges to tackle:

Submissions must retain user and team identity, which means that we must keep track of users and their submissions over time, since data challenges are designed to encourage iterative improvement and optimization.
Evaluations must use potentially complex resources and data since many data challenges operate by publicly sharing a small dataset, and then running it against a much more complex dataset.
Evaluations must be totally secure, so that submissions can’t do nefarious things like mine cryptocurrency or extract the challenge’s private data in unintended ways.
Evaluations must be automated, so that running the challenge does not require extensive human intervention and can scale to many users.
Evaluation must be flexible, so that the infrastructure can accept a variety of types of submissions (e.g. code, data, model weights, etc), run them with arbitrary environments designed by the organizers, and run them with the right hardware to get the job done.

These are just a few of the major challenges that we’ve tried to address with frx-challenges, and we’re excited to see how it goes with our first assisted community challenge: the Cellmap Challenge.

If you’re interested in learning more or participating in this project, follow along at its GitHub repository:

2i2c-org/frx-challenges

This is still the very early stages of the project, and we imagine it will evolve significantly. We welcome feedback for how it can more effectively serve a variety of communities.

Acknowledgements #

Thanks to the Howard Hughes Medical Institute (HHMI) for collaborating with us on the Cellmap Challenge, which led to the creation of this project.

Thanks to Kristen Ratan and Strategies for Open Science (Stratos) for enabling this collaboration, and providing strategic guidance and support.

Improving the logged in home page experience in JupyterHub with `jupyterhub-fancy-profiles`

Mon, 18 Nov 2024 12:55:20 -0800

On most research oriented JupyterHub installations, users would like to customize their server (the environment, resources available, etc) after logging in. In Kubernetes based JupyterHub environments, a profile list provides this functionality.

(Profile List for the NASA VEDA JupyterHub with the default implementation from KubeSpawner)

The profile list is the de-facto “logged in homepage” for these users, as that is what they see after they have logged in.

In collaboration with Development Seed, funded by our earlier grant from GESIS as well as the NASA VEDA project, we have been building the jupyterhub-fancy-profiles project to improve this experience.

(Profile List for the NASA VEDA JupyterHub with jupyterhub-fancy-profiles)

Last week, we rolled this new experience out to all 2i2c managed JupyterHubs! Here’s a quick rundown of what this enables:

Descriptions for choices in the dropdowns, making it much easier for users to know what they are getting with each environment (or resource selection).
Fully backwards compatible with the existing KubeSpawner profile list implementation. In our PR to roll this out to all hubs, you notice that we didn’t have to change the structure of any profile lists! So you can safely roll this out to your hubs too without needing to fundamentally change how your profiles are set up.
It is a modern web app (built with react), just like the JupyterHub admin panel. This allows us to evolve and satisfy user needs much faster, as well as expanding the pool of people who can contribute to the project!
Support for dynamically building images using mybinder.org style repositories! It talks to the binderhub API so users can build reproducible environments as they wish without admin involvement nor needing to fully understand how docker and containers work. Our earlier blog post has more information.

This is just the start, and thanks to ongoing funding from the NASA VEDA project, we are going to continue making improvements to this experience.

Use this in your JupyterHub #

As with everything we build at 2i2c (per our right to replicate policy), this project can be used with any JupyterHub installation that uses Kubernetes. There are instructions in the README. Please try it out on yours and let us know what you think!

Credit #

The project was initiated with funding generously provided by GESIS (see our earlier blog post).
Sanjay Bhangar and Oliver Roick from Development Seed for advocating for this project and contributing heavily to it.
The NASA VEDA project (in particular, Brian Freitag and Alex Mandel), for continued funding (in the form of engineering time) plus being early adopters!

Collaborating with Development Seed to deliver cyberinfrastructure for NASA VEDA

Fri, 12 Jul 2024 00:00:00 +0000

Thank you to Sajjad Anwar and Sanjay Bhangar for contributing to this post.

The VEDA dashboard

The 2i2c team are proud to continue our strong working collaboration with Development Seed, following our previous work on launching the US GHG center (also see the Development Seed blog post). Together with scientists at NASA in our regular sync touchpoints, we have recently delivered a tranche of improvements to the Visualization, Exploration and Data Analysis (VEDA) project.

This platform is designed to thread open-source components together to consolidate GIS delivery mechanisms, processing, analysis and visualization tools, and presented in a collaborative interactive computing environment. All code repositories and associated resources stemming from this work are available on the VEDA GitHub page.

In the spirit of fully open development, you can see the objectives the combined 2i2c and Development Seed team had for the last quarter. In this blog post, we will describe some of the significant ones!

Better image management and testing #

The repo2docker-action is a GitHub action simplifying image building and testing for use with JupyterHub, using either a Dockerfile or various configuration files (like requirements.txt, environment.yml, etc) supported by repo2docker. We migrated our image building pipeline from a somewhat homegrown solution to this upstream action, making image updates and testing much easier. In particular, we can automatically run test notebooks on every change we make to the image! This way, we can easily catch any breaking changes in library versions or other package installs without disrupting users. We also debugged and contributed upstream fixes to the testing infrastructure so everyone could benefit from this, rather than just us.

Automatically pulling example notebooks on startup #

When a user logs into a JupyterHub, it is very helpful if we could have a bunch of example notebooks and other content pre-populated for them so they can get started right away. nbgitpuller is heavily used for this particular use case. However, it requires that nbgitpuller is installed inside the image the user is using - and not all images have it installed. In particular, we wanted to continue using the (wonderful) Rocker images maintained upstream for R users, however they do not have nbgitpuller installed. To solve this problem we built jupyterhub-gitpuller-init, which can be used as an init container to pre-populate user content on persistent home directories regardless of the image used. We also made sure to build this in a way that anyone can use it, and it is not tied into either 2i2c or VEDA infrastructure!

Opening specific visualizations in QGIS via URL #

QGIS is the world’s most used open source GIS software, and previously 2i2c had worked with Openscapes and QGreenland to bring this desktop software to JupyterHub. We had previously worked on a container image that allows users to access large datasets stored in the cloud directly through QGIS on the JupyterHub, allowing users to work with much larger datasets than they could on their desktops by bringing cloud compute adjacent to the data. As a continuation of this work, we developed jupyter-remote-qgis-proxy, which builds QGIS specific features on top of jupyter-remote-desktop-proxy. In particular, it allows creation of shareable links that when clicked, opens specific datasets and layers in QGIS in a JupyterHub! You can see this in action:

Launching QGIS on a Linux desktop served by the VEDA JupyterHub

This opens up exciting future possibilities. Imagine this exploration of the Camp Fire having an ‘Open in QGIS’ button that enables further exploration of the data without the user needing to download or install anything! Work will continue in the coming quarter towards achieving this vision.

We are also excited to see recent work in this space from QuantStack and Simula Labs, and will follow up to ensure an orderly transition to more web native workflows for existing users of QGIS in due time.

Better Profile Selection #

This is a continuation of our GESIS collaboration. In the path to deploying dynamic image building to end users, we wanted to stabilize jupyterhub-fancy-profiles enough to deploy to users of VEDA (and eventually everyone else). This is the primary interface users see after they log in to JupyterHub, and was ripe for UX improvements. The default interface looks like this:

The revamped one is much more streamlined and looks like this:

Revamped Profile Screen

This is currently deployed to a staging hub and has helped us shake out a lot of bugs! We expect the improved interface will be rolled out to all users in the near future. We are also planning further development to make the user experience even better and smoother for everyone.

Supporting workshops #

End users benefiting from our work is what ultimately gives meaning to our work. To that end, we were very happy to support running workshops during this collaboration – see our related blog post US Greenhouse Gas Center supports summer school at CIRA for more information.

Ongoing Collaboration #

Delivering on these objectives in a timely way heavily depended on the success of the team collaboration. Sanjay Bhangar of Development Seed commented

Working closely with the 2i2c team on growing features to support users on the VEDA and GHG Center hubs has been absolutely amazing. With 2i2c’s deep experience in the Jupyter ecosystem, we have been able to implement some fairly complex features quite easily, and their strong open-source roots have ensured that whatever we work on is broadly useful to the wider Jupyter and scientific computing communities.

Take a look at the companion Development Seed blog post of this work.

This collaboration continues, and we have now published our objectives for the coming quarter. Watch this space!

Acknowledgements #

Development Seed
NASA IMPACT
Tarashish Mishra, Julia Signell, Oliver Roick, Slesa Adhikari and Sanjay Bhangar for various code contributions towards these objectives

Openscapes Host a Surface Biology and Geology Workshop with Shared Password Feature

Tue, 09 Jul 2024 00:00:00 +0000

Thanks to Brianna Lind, Julia Lowndes and Andy Teucher for contributing to this blog post!

Surface Biology and Geology: VITALS Workshop

Openscapes is a value-based initiative that supports kinder, better science based on open source community. NASA Openscapes is in its fourth year as a project supporting NASA Earth science in the Cloud, co-developed by Julia Lowndes (Openscapes) and Erin Robinson (Metadata Game Changers).

The initiative recently supported the Surface Biology and Geology: VITALS Workshop hosted by NASA Land Processes Distributed Activate Archive Center (LP DAAC) and NASA Jet Propulsion Laboratory (JPL).

Instructors used the 2i2c Openscapes Hub to lead hands-on exercises teaching learners how to manipulate data collected from the ECOSTRESS and EMIT instruments onboard the International Space Station. They used Jupyter notebooks in the Hub to demonstrate how open source tools together with cloud data and compute resources could effectively analyse the the Canopy Water Content and the Land Surface Temperature over the Jack and Laura Dangermond Preserve, Santa Barbara, CA.

Plot of the Canopy Water Content over the Jack and Laura Dangermond Preserve, Santa Barbara, CA from a VITALS Workshop Jupyter notebook.

This event was attended by around 250 participants. An event of this size therefore requires a frictionless login flow so that organizers could focus on the essential complexity of teaching data analysis rather than the accidental complexity of managing Hub authorization. GitHub authentication is the default option for most 2i2c Hubs for research use cases, but for an educational event of this size this option was not fit for purpose since organizers had to

Retrieve the GitHub usernames of each participant (assuming everyone was familiar with GitHub!)
Manually invite GitHub users to a GitHub organization to authorize access to the Hub (invitations would expire within seven days)
Repeat the above two steps last-minute for participants who showed up on the day without preparing
Manually remove GitHub users from the GitHub organization if they wanted to revoke access to the Hub after the event.

In response to this need, we developed a shared password feature so that workshop organizers can simply hand the share password out to learners for access to the Hub. This bypassed the manual labour of managing GitHub accounts while not adding to the learner’s high cognitive load and improving the participant’s learning experience overall.

One of the elements that enabled us to recognize and solve this issue effectively is our close partnership with the Openscapes team. We engage in regular 6-weekly catch-ups where we can learn about user requirements and how we can develop our infrastructure to co-create optimal solutions. Together with our Product Delivery Flow, we were quickly able to architect the shared password solution in time for the workshop.

Feedback from Brianna Lind (LP DAAC)

We have documented the technical infrastructure changes required to enable a shared password for the Hub in our Infrastructure Guide and hope to support many future events with this mechanism!

Acknowledgements #

NASA Openscapes
NASA LP DAAC
NASA JPL
NASA ROSES funding
NASA Open Science / ScienceCore for supporting some of our work on JupyterHub.

Enabling neuroscience in the cloud with HHMI Spyglass and MySQL on JupyterHub

Fri, 05 Jul 2024 00:00:00 +0000

The HHMI Spyglass tutorial

Spyglass #

Spyglass is a framework for reproducible and shareable neuroscience research produced by Loren Frank’s lab at the University of California, San Francisco. Check out our blog post about the release of their preprint to read more about the methods.

This post focuses on the complex data storage needed for the project, which can be difficult to set up locally or at scale in the cloud. In particular, the analysis needed a MySQL database for reproducibility. This is a fairly common task across many fields. The aim of 2i2c is to enable researchers to focus on the essential complexity of what they were doing, i.e. the science, without managing the accidental complexity of how to do it – in this case, setting up databases.

We describe how you can do this too for your own JupyterHubs. Since 2i2c commits to running our infrastructure in line with open-source values as much as possible, you can also directly see the configuration for the hub referenced in the paper.

What is a “sidecar container”? #

The Kubernetes definition of a sidecar container is

Sidecar containers are the secondary containers that run along with the main application container within the same Pod. These containers are used to enhance or to extend the functionality of the primary app container by providing additional services, or functionality such as logging, monitoring, security, or data synchronization, without directly altering the primary application code.

In this case, the primary app container is the JupyterLab instance where people are interactively running code and doing science. We want to provide a MySQL database as a sidecar so that each user server gets their own independent MySQL server instance (that is not accessible to anyone else). We can then run code such as

%%bash
mysql -h 127.0.0.1 -u root --password=tutorial < path-to-sql-file-with-data

to load data into the database. Note the IP address 127.0.0.1 - the MySQL server is listening on localhost, even though it is not running in the same container! Thanks to the magic of Linux Network Namespaces, the sidecar and main app container can share 127.0.0.1. This allows you to write code that works in the exact same way on a user’s local computers as on the JupyterHub, making transitions and replication easier.

Setting up sidecars in JupyterHub on Kubernetes #

We’re leveraging multiple tools from the open-source ecosystem - JupyterHub, Kubernetes, Linux as well as MySQL itself.

Since this is a Kubernetes feature, we can pass through config to it. There are two layers here, which are

singleuser.extraContainers in z2jh configuration
KubeSpawner.extra_containers in KubeSpawner configuration

The hub configuration looks like

 singleuser:
 extraContainers:
 - name: mysql
 image: datajoint/mysql:8.0 # following the spyglass tutorial at https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
 ports:
 - name: mysql
 containerPort: 3306
 resources:
 limits:
 # Best effort only. No more than 1 CPU, and if mysql uses more than 4G, restart it
 memory: 4Gi
 cpu: 1.0
 requests:
 # If we don't set requests, k8s sets requests == limits!
 # So we set something tiny
 memory: 64Mi
 cpu: 0.01
 env:
 # Configured using the env vars documented in https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
 - name: MYSQL_ROOT_PASSWORD
 value: "tutorial"

By setting this up, we allow users to insert the code snippet above

%%bash
mysql -h 127.0.0.1 -u root --password=tutorial < path-to-sql-file-with-data

into their Jupyter Notebooks, which gives access to their MySQL database in the hub!

However, this configuration does not include permanently store the database itself between hub server sessions. Thanks to a pilot in a prior collaboration with University of Texas, Austin, we do have some documentation on how you can enable that as well!

Acknowledgements #

Howard Hughes Medical Institute
National Institute of Mental Health (NIMH), grant number RF1MH130623
kubespawner
zero-to-jupyterhub-k8s and the JupyterHub community

Integrating BinderHub with JupyterHub: Empowering users to manage their own environments

Wed, 03 Jan 2024 16:56:14 -0800

Thanks to Arnim Bleier, Jenny Wong, Georgiana Elena, Damián Avila, Jim Colliander and James Munroe for contributing to this blog post

mybinder.org is a very popular service that allows end users to specify and share the environment (languages, packages, etc) required for their notebooks to run correctly by placing configuration files they are already familiar with (like requirements.txt or environment.yml) along with their notebooks. While not without its own set of challenges, this is extremely powerful because it puts control of the environment in the hands of the people who write the code. They can customize the environment to fit the needs of their code, instead of having to fit their code into the environment that admins have made available.

But, mybinder.org (and the BinderHub software that powers it) is built for sharing your work after you are done with it, not for actively doing work. BinderHubs often do not have persistent storage nor persistent user identity, and UX is centered around ephemeral interactivity that can be shared with others (via a link), rather than persistent interactivity that a single user repeatedly comes back to. JupyterHub is more commonly used for this kinda workflow, but doesn’t currently have the ability for users to easily build their own environments. Admins who are running the JupyterHub can make multiple environments available for users to choose from, but this still puts admins in the critical path for environment customization.

Our collaboration with GESIS, NFDI4DS, and CESSDA, aims to bring this flexibility to JupyterHub directly. We aim to empower users to decide for themselves which applications and dependencies are installed on a per-project basis. Our work enables communities with heterogeneous requirements to share a single Hub. Our approach frees administrators from being overwhelmed by installation requests and transforms the JupyterHub platform into a platform for collaborative computational reproducibility. In this update, we report on our progress and upcoming steps in this project.

What does a BinderHub do, exactly? #

It is helpful to understand that BinderHub primarily has 3 responsibilities:

Present a UI to the end user for them to provide details on what to build (this is what you see when you go to mybinder.org)
Call out to repo2docker in a scalable way to actually build and push an image containing the environment for the given repository, and show the user logs as this build process happens. This also allows users to debug issues with their build more easily.
Talk to a JupyterHub instance to launch a user server with the built docker image, and redirect the user to this.

(2) is really the core feature of BinderHub, and we settled on figuring out how to make that available to JupyterHub users. It was really important to us that this was also done in a way that can be sustainably used by everyone, not just 2i2c. This blog post discusses the various improvements to the broad ecosystem of projects in the Jupyter ecosystem to get this done.

Demo #

But first, a very quick demo of how this looks like right now now!

This is very much a work in progress, but the basic flow can be seen clearly. Users see a Server Options menu after they log into JupyterHub. They can specify the two primary things that determine the server configuration:

The resources allocated (RAM, CPU and maybe GPU)
The environment (container image) used, which can be specified in one of 3 ways:

a. A pre-selected list of environments (container images), provided by the administrators who set up this JupyterHub b. A blank text box where you can enter any publicly available docker image they want c. A mybinder.org style way to specify a GitHub repository, which will be then dynamically built into a docker image for the user!

So what did we need to do to accomplish this, in a way that’s very upstream friendly and usable by everyone (and not just 2i2c)?

A Standalone `binderhub-service` helm chart #

The default upstream BinderHub helm chart includes a JupyterHub as a dependency, and configures itself to be used primarily in a manner similar to mybinder.org. As the person who helped make that choice early on, I can tell you why it was made - for convenience! And it was very convenient, as it allowed us to get mybinder.org going fast. However, it makes it difficult to install a BinderHub service alongside an existing JupyterHub. To this end, we have created a standalone BinderHub helm chart, designed to be installed alongside an existing JupyterHub, so we can use it purely to build images. This allows the BinderHub instance to be used as a JupyterHub Service, which is what we want.

While this helm chart is currently under the 2i2c GitHub org, the hope is that it can eventually migrate to a jupyterhub-contrib organization (once it is created), or it can become the upstream helm chart for BinderHub if enough work can be done in BinderHub to allow it to serve use cases like mybinder.org.

As part of this work, we also added a way for BinderHub to run in API only mode, so we can fully turn off the UI and launching ability of BinderHub. This change decoupled the three responsibilities of BinderHub we discussed previously, allowing us to bring our own UI and JupyterHub. BinderHub could now be used purely for its scalable image building features, which is exactly what we want!

Sustainably extending KubeSpawner’s `profileList` #

We identified KubeSpawner’s profileList feature as the ideal location for UI to dynamically build environments (container images), making it just another ’environment choice’ people can choose, along with picking the resources their server needs. From an end-user perspective, it was also the logical place for them to specify a repository to build into an environment, as they could already choose some pre-built environments from here. They can also select other arbitrary resources they want (such as memory, GPU, etc) from here as well. From a maintainer perspective, it helps with long-term maintenance of the JupyterHub projects.

The implementation of profileList however, was not easy to extend at this point. So this PR improved how easy it was to extend it in more complex ways, without making the implementation in KubeSpawner itself complicated. Even though this had no visible end-user effects, it was an extremely important step in allowing us to experiment with UI in a sustainable way without having to rely on upstream. These kinds of changes can sometimes be hard to sell to stakeholders but are extremely important in ensuring a continuous and sustainable relationship with upstream.

Implementing `unlisted_choice` feature in KubeSpawner #

The profileList feature was built to allow JupyterHub admins to specify an explicit list of container images the end-user can choose from. It did not have a way for any choice that was not pre-approved by the admin to be used. We needed this feature since the BinderHub API will build a new docker image for each environment the user wants, and so this can not be chosen from a pre-approved list. We had to safely add this feature to KubeSpawner in such a way that it was generally useful to everyone. Many other communities had been asking for such a feature anyway - the ability to simply ’type in’ an image and have that be used.

NASA VEDA was one such community, so we partnered with Sanjay Bhangar from Development Seed (an organization that helps run NASA VEDA) to implement this feature. Engineers from 2i2c contributed heavily to this feature as well, and after several PRs ( 1, 2, 3, 4 and 5), this feature is now available for everyone to use!

A key component of doing sustainable upstream work is that every addition needs to be useful by itself for a broad group of people. This change was very helpful for many communities that wanted to allow their users the freedom to pick whatever image they want to use, regardless of wether they wanted to use dynamic image building or not. The broad interest allowed us to build a coalition with other interested parties, and get the change accepted upstream more easily!

`jupyterhub-fancy-profiles` #

Once we had all these pieces in place, it was time to actually work on the frontend UI that would allow users to build images dynamically and launch them. Since this will replace the ‘profileList’ feature, it should also allow them to select different resources (RAM, CPU, etc) as needed, as well as type in an existing image if they desire. So it was a full re-implementation of the profileList frontend.

This is ongoing now at the jupyterhub-fancy-profiles project. It is a pure frontend web application, using modern frontend tooling ( React, webpack, Babel, etc) and written in JavaScript. It’s gone through a few revisions, but the demo provided earlier in the blog post is in its current state. Because the default profileList implementation is pure HTML / CSS with very minimal JS, it is limited in what kind of UX it could have. jupyterhub-fancy-profiles aims to be very helpful even when dynamic image-building features are not enabled on a JupyterHub. We hope to roll this out to a few JupyterHubs and improve it over time based on feedback.

`jupyterhub/@binderhub-client` npm package #

While building jupyterhub-fancy-profiles, we wanted to use the same javascript code used by BinderHub frontend to interact with the BinderHub API, instead of re-implementing it. However, the existing BinderHub JavaScript code was not easily consumable by external projects. We refactored the code, added tests, migrated to use modern JS practices and published the jupyterhub/@binderhub-client NPM package that can be used not just by jupyerhub-fancy-profiles but any external project for talking to the BinderHub API.

This had to be done in such a way that current BinderHub installations (such as mybinder.org) do not break. That took quite a few pull requests: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. This refactoring work was very helpful to us, and also appreciated by the broader community.

Defending against cryptojacking with `cryptnono` #

For Open Science to flourish, we need to allow access to resources without login / paywalls wherever possible. A new menace against this has been cryptojacking - where attackers use up any and all available free compute to mine cryptocurrencies. This has affected many folks on the internet, including GitHub Actions and mybinder.org, the primary public BinderHub installation. mybinder.org has some extra protections against cryptojacking that aren’t easily usable elsewhere, and this has unfortunately meant that the demo JupyterHubs we have with these features enabled have been behind a login wall. I personally believe login walls are long term antithetical to open science, and so this was an important problem to solve.

cryptnono is an open source project designed to help fight cryptojacking, and as part of this grant we ported some of this functionality out of mybinder.org specific code into cryptnono, so other deployments may also benefit from it! We also migrated to using the super efficient ebpf Linux Kernel subsystem, allowing for more complex heuristics to catch a much broader range of cryptomining activity. We have been slowly tweaking the config on mybinder.org, and it has proven to be very effective! This will be very helpful for anyone who wants to provide a JupyterHub (or any other computational service) without a login wall. If you are interested in using cryptnono in this fashion, please reach out to us so we can work together!

Explored pathways that were then discarded #

List of things that were tried and then decided as not good pathways:

repo2docker-service, a separate JupyterHub service that could only build images. As we worked on it, we realized that it was replicating a lot of features that BinderHub already has, so we pivoted to working on BinderHub directly instead.
Building off of tljh-repo2docker. While this already had a nice UI, it would be hard to port it to run on a distributed Kubernetes environment without it becoming a ‘hard fork’.

While these did slow down the implementation of the project, it has allowed us to be very confident that the methods we have chosen are long-term sustainable.

Want to try this out? #

We have a demo of this running at imagebuilding-demo.2i2c.cloud, but unfortunately as we are still fine-tuning cryptnono config, at this moment it is not open to the public. Please contact me with your GitHub account if you want access, and promise to not be a cryptominer and you shall be granted access.

Want to set this up on your own JupyterHub? There is some work in progress documentation and more is being worked on. Drop a line in the linked pull request and we’ll be happy to help. The eventual goal is for anyone to be able to simply follow documentation and set this up for themselves.

We also have user facing documentation on using this service on docs.2i2c.org.

Future work #

This is not complete of course, and there is a lot of future work to be done.

mybinder.org also helps you distribute your content, not just the environment for your code to run in. Since JupyterHub usually comes with a persistent home directory for the user, nbgitpuller is commonly used for this purpose instead. We should explore ways to integrate nbgitpuller (and other ways to distribute content) in the future.
More thorough documentation for how you can recreate what is in the demo for yourself in your own JupyterHub installation.
Better UX for specifying images, including figuring out how to ‘save’ them for future reuse.
Better compatibility with mybinder.org, particularly in allowing other sources of environments (not just GitHub, but Zenodo, raw git repositories, etc) and URL compatibility.
Better authentication workflow between the frontend and the BinderHub API.

Credit #

All this work would not be possible without a large group of collaborators!

From 2i2c: Erik Sundell, Georgiana Elena, Yuvi, James Munroe, and Damián Avila.
The persistent BinderHub project was the direct inspiration for all this work, with particular thanks to Kenan Erdogan.
The tljh-repo2docker project, which explores similar ideas in the context of running only on a single node.
The broad JupyterHub and MyBinder.org community, particularly Simon Li and MinRK.
Funding generously provided by GESIS in cooperation with NFDI4DS (project number: 460234259) and CESSDA.
Arnim Bleier from GESIS was instrumental in making this project happen.

2i2c supports Jupyter Docker Stacks ARM builds

Fri, 01 Dec 2023 00:00:00 +0000

The Jupyter Docker Stacks project provides a collection of ready-to-use Docker images for Jupyter environments. These images are used by many in the Jupyter community, including 2i2c which uses them as base images for our JupyterHub deployments.

The project recently began publishing ARM-compatible images alongside the standard x86 images, making it easier for users with ARM-based systems (like M1 Macs) to use these environments. However, building and hosting these ARM images comes with additional cloud computing costs that were being personally covered by @mathbunnyru, one of the project’s maintainers.

A part of 2i2c’s mission is supporting upstream communities that we rely on, especially where the upstream project has limited resources. For this reason, we’ve decided to support Jupyter Docker Stack’s ARM building costs, with a total budget of $2000 (approximately $150 per month). As a regular user and beneficiary of the Jupyter Docker Stacks, we believe it’s important to contribute to the maintenance and sustainability of this crucial piece of infrastructure that benefits the entire Jupyter community.

We hope this support helps the Docker Stacks project remain healthy, and continue providing high-quality, multi-architecture images that work across different computing platforms. We’ll revisit this decision as the landscape of technology providers changes and other options arise.

Acknowledgments #

Thanks to Project Jupyter (particularly the jupyter-stacks team) for this project.

A QGIS desktop in the cloud with JupyterHub

Sat, 05 Aug 2023 00:00:00 +0000

The QGreenland Researcher Workshop

JupyterHub is a versatile platform that can serve a desktop with Geospatial Information Systems (GIS) software in the cloud. This was demonstrated by the QGreenland Researcher Workshop that was hosted by the NASA CryoCloud hub. The hands-on workshop trained 25-30 researchers, from Germany, India, France, Canada, Poland and the United States, on how to work with geospatial data in an open science framework.

QGreenland Overview #

QGreenland is an open-source geospatial data package designed for QGIS, a community-owned GIS platform. It focuses on Greenland, offering researchers and educators a comprehensive toolset for FAIR (findable, accessible, interoperable and reproducible) data analysis. The package integrates a variety of datasets into a single, easy-to-use data-viewing and analysis platform, supporting both offline and online use. This makes it particularly valuable for remote fieldwork and areas with limited internet access.

Workshop Success #

The QGreenland workshop demonstrated several key benefits of using JupyterHub for cloud-based GIS:

Accessibility: Participants from across the world could access the same powerful GIS tools through a web browser, eliminating the need for complex local installations while enhancing reproducibility
Cloud block storage: Using a JupyterHub in the cloud allowed for faster data access than a traditional NFS file store by provisioning each user with an elastic block store disk, reducing load times from 5 minutes to under 3 seconds.
Cost Efficiency: Utilizing the CryoCloud JupyterHub instance managed by 2i2c drastically cut down setup costs and time, with only minimal cloud operating expenses of roughly $1/person/day.

Conclusion #

The success of the QGreenland workshop underscores the potential of integrating interactive software applications in JupyterHub. This approach not only democratizes access to advanced geospatial tools but also fosters a collaborative research environment. We look forward to supporting more workshops for QGreenland in the future!

Want to know more? Check out the companion post by QGreenland on the Jupyter Blog

Acknowledgements #

Trey Stafford (CIRES)
Matthew Fisher (CIRES)
*Fisher, M., *T. Stafford, T. Moon, and A. Thurber (2023). QGreenland (v3) [software], National Snow and Ice Data Center.
Snow, Tasha, Millstein, Joanna, Scheick, Jessica, Sauthoff, Wilson, Leong, Wei Ji, Colliander, James, Pérez, Fernando, James Munroe, Felikson, Denis, Sutterley, Tyler, & Siegfried, Matthew. (2023). CryoCloud JupyterBook (2023.01.26). Zenodo. 10.5281/zenodo.7576602

* Denotes co-equal lead authorship

Yuvaraj (Yuvi)

Mon, 01 Jan 0001 00:00:00 +0000

Building participatory open infrastructure for scientific & educational use cases. A Project Jupyter team member working on infrastructure related projects. Ex Wikimedia and ex-GNOME. Let’s eliminate accidental complexities wherever we find them.

Highlights:

10+ years experience building open infrastructure for scientific and educational communities
Jupyter Distinguished Contributor
Leader in the JupyterHub and Binder projects
Served as the Infrastructure Architect behind UC Berkeley’s scalable DataHub
Former ops engineer at Wikimedia and GNOME.

Yuvaraj (Yuvi) | 2i2c

Announcing our public roadmap for open development

Why we’re opening up our roadmap #

We hope to use a shared roadmap to funnel more resources into open source #

Fixing the mybinder.org usage analytics archive

Learn more #

Acknowledgements #

Combating tcp scanning on mybinder.org with the tcpflowkiller

Why this matters #

Learn more #

Acknowledgements #

From scattered effort to strategic impact: How we're systematizing our Foundational open source contributions

The challenge: Why scattered individual efforts aren’t enough #

Our long-term goal: Multi-stakeholder, resilient communities #

Two key objectives #

Four pilot activities #

Review Pull Requests from non-maintainers #

KPIs #

Issue Triage office hours #

KPIs #

Sponsoring and Mentoring new Maintainers #

KPI #

Increase bus factor and diversity of people making releases #

KPIs #

Criteria for upstream projects to support #

How we’ll implement this #

Who is responsible #

How we’ll fund this work #

Next step: Learning in public #

Acknowledgements #

On being a good open source citizen: supporting a healthy ecosystem through directed and foundational contributions

Everybody has an open source hat and a stakeholder hat #

Directed Contributions benefit the stakeholder you represent #

We plan Directed Contributions according to our roadmap and member feedback #

Foundational Contributions support a healthy open source community #

We plan Foundational Contributions alongside our engineering roadmap #

What’s next #

Sharing JupyterHub's vision for more flexible application deployment at the doepy talk series.

Learn more #

Acknowledgements #

Solving classes of problems, rather than just an instance of a problem (with an example)

The Problem #

An incomplete solution #

What was the class of issues we could fix here? #

Structured solutions #

Moving forward #

Acknowledgements #

Simplifying and speeding up Binder builds with BuildKit

Acknowledgements #

2i2c joins the mybinder.org federation with a cheaper and faster way to deploy Binderhub

Cloud infrastructure has become cheaper and more commodified #

Deploying BinderHub on a single-node VM is cheaper and simpler #

2i2c.mybinder.org now serves 70% of the mybinder.org federation #

Others can join the mybinder.org federation using this approach as well #

Anybody want to fund this? #

Acknowledgements #

Announcing our formal commitment to open technology

Our commitment to open technology #

What does this commitment mean? #

Why are licenses and CLAs important? #

What does this change about 2i2c’s open source commitment? #

Who is this for? #

We’d love feedback #

NASA VEDA & 2i2c Update for Q4 2024 (Oct-Dec 2024)

Automated backups and alerting with jupyterhub-home-nfs #

Enable users to dynamically build environments with jupyterhub-fancy-profiles #

“Open in QGIS” from VEDA UI #

Other updates #

Acknowledgements #

`frx-challenges`: A new tool to host data challenges for Frictionless Research Exchanges

What are Frictionless Research Exchanges #

Why facilitate data challenges? #

Acknowledgements #

Improving the logged in home page experience in JupyterHub with `jupyterhub-fancy-profiles`

Use this in your JupyterHub #

Credit #

Collaborating with Development Seed to deliver cyberinfrastructure for NASA VEDA

Better image management and testing #

Automatically pulling example notebooks on startup #

Opening specific visualizations in QGIS via URL #

Automated backups and alerting with `jupyterhub-home-nfs` #

Enable users to dynamically build environments with `jupyterhub-fancy-profiles` #

A Standalone `binderhub-service` helm chart #

Sustainably extending KubeSpawner’s `profileList` #

Implementing `unlisted_choice` feature in KubeSpawner #

`jupyterhub-fancy-profiles` #

`jupyterhub/@binderhub-client` npm package #

Defending against cryptojacking with `cryptnono` #