2022-04-21 meeting minutes

Types of Data To Collect

Community

  • growth: both in terms of new interested individuals and conversion to contributor. data that reflects this dimension:
    • the number of contributors to the code base (github PRs)
    • the number of contributors to design discussions (discord)
    • the number of contributors to requirements (github issues)
  • diversity: no single organization keeps the project live. data that reflects this dimension:
    • the number of organizations contributing to the code base (github PRs)
  • retention: interesting/useful projects attract contributors, healthy projects retain them. data that reflects this dimension:
    • active contributor longevity (github PRs, discord)
  • maturity: This gives context to the others, stats will be different in different life cycle phases, what may be a red flag for a mature project may not be so for a young one. can measure by age.
    • when was the first commit
    • frequency of releases (more mature projects have more regular cadence and have a higher success rate to achieve the cadence)
  • friendliness to new contributors/ideas
    • number of good-first-issues
    • new contributors onboarded
    • can new ideas be accommodated, even if that may lead to forking of the code base
  • responsiveness: how long until proposed changes (code, design, bug reports, etc.) are given attention? data that reflects this dimension:
    • time to resolve PRs and issues (github)
    • time to respond to questions (discord)

Code

  • usefulness: is the project being adopted by customers and tire kickers? data that reflects this dimension:
    • usage information provided by customers and developers
    • number of questions from clients trying to use the code
    • docker pulls
    • release binary downloads
    • tagged online resources: case studies, presentations, mentorship programs
    • amount of research publications it generates
  • production-readiness: is the current code base coherent enough to be usable in a real-world scenario? data that reflects this dimension:
    • release number (latest is 1.0.0 or later?)
    • test coverage
    • performance and reliability testing data
    • user documentation
  • Fundamental metrics:
    • commit rate: number of commits per month etc.
    • maybe indicators that would allow us to catch when a project starts to "cool down" or "people are leaving it for other options"
  • Docs:
    • does it exist
    • quality
  • amount of innovation: how cutting edge
    • ePrint, arxiv
    • measures academic interests

The learning WG can help with badging aspect of this.

The dashboard should be customized to make sure not all dimensions are applied to all projects (to avoid steering the project maintainers to the wrong directions)

Early start metrics (stars, forks, users in chat channels): the dashboard can also be useful to the project maintainers to grow (besides being useful to the TSC)

Do contributors have conflicting priorities that may lead to split of the community or forking to a new project. should be evaluated on a case by case basis. some splits could be beneficial.

Sources/Means to Collect

github (contributors, code activities, PRs)

discord (engagement level, responsiveness)

information contributed by project maintainers and members (technology usage)