Applications

We’ve had to implement applications for homework and a class project. They are included here.

Okra API Batch Job

Generate tables for the Okra-API management command

Strategy here is to check a directory for updates, then consolidate those into parquet files with datetime in file name. Those parquet files are then uploaded into the Okra API.

Metrics are targeting the ‘proto/okra_api.proto’ interface.

okra.mgmt_api.msg_iso_date_aggregation(msg, item, status: str, yearmo: str)[source]

Compute IsoDateAggregation message

okra.mgmt_api.msg_repository_info(dal: okra.models.DataAccessLayer, repo_id: str, yearmo: str)[source]

Compute RepositoryInfo message

Note that default behavior for first/last msg_iso_date_aggregation() is to use the same msg item for IsoDateAggregation if only one msg item exists for a given yearmo. An empty message except for repo_id and yearmo will be returned if no commits exist for a given yearmo.

Truck Factor

Compute Truck Factor

The truck factor assignment includes several factors. We’re just going to focus on the actual truck factor computation from a database in this file.

References:

Assignment: http://janvitek.org/events/NEU/6050/a4.html Paper: http://janvitek.org/events/NEU/6050/Ps/truck.pdf

okra.assn4.author_file_owned(owner, project, dal)[source]

Compute file ownership by each author.

Parameters
  • owner – name of owner

  • project – name of project

  • dal – okra.models.DataAccessLayer

Returns

list of file owners based on max number of lines written

Return type

list of sqlalchemy objects

okra.assn4.author_number_of_files_owned(results)[source]

Number of files owned by author.

Parameters

results – results from author_file_owned()

Returns

{author: number of files owned}

Return type

dict

okra.assn4.get_truck_factor_by_project(owner, project, dal)[source]

Get the ‘truck factor’ by project.

  1. For each project, and each file, compute how many lines were added by each unique user.

  2. For each project, and each file, find which user created the file.

  3. Given the above two results compute the ownership of each file.

  4. For each project, and each file pick an owner.

  5. For each project, rank the users by the number of files they own.

  6. Given all of the above compute the Truck Factor as the smallest set of users such that they own more than half of the files in the project

Parameters
  • owner – name of owner

  • project – name of project

  • dal – okra.models.DataAccessLayer

Returns

Truck factor score for a GitHub project, Truck set members

Return type

tuple (int, list)

okra.assn4.smallest_owner_set(authors, total, size=0.5)[source]

Smallest set of authors owning more than half of project files.

Parameters
  • authors – author_number_of_files_owned() output

  • total – total_number_of_files_by_project() output

Returns

(number of members in smallest set, smallest set)

Return type

tuple

okra.assn4.total_number_of_contributors_by_project(owner, project, dal)[source]

Compute the total number of contributors by project.

Parameters
  • owner – name of owner

  • project – name of project

  • dal – okra.models.DataAccessLayer

Returns

the total number of contributors in a project

Return type

int

okra.assn4.total_number_of_files_by_project(owner, project, dal)[source]

Compute the total number of files by project.

Parameters
  • owner – name of owner

  • project – name of project

  • dal – okra.models.DataAccessLayer

Returns

the number of files in a project

Return type

int