Applications¶

We’ve had to implement applications for homework and a class project. They are included here.

Okra API Batch Job¶

Generate tables for the Okra-API management command

Strategy here is to check a directory for updates, then consolidate those into parquet files with datetime in file name. Those parquet files are then uploaded into the Okra API.

Metrics are targeting the ‘proto/okra_api.proto’ interface.

okra.mgmt_api.msg_iso_date_aggregation(msg, item, status: str, yearmo: str)[source]¶: Compute IsoDateAggregation message

okra.mgmt_api.msg_repository_info(dal: okra.models.DataAccessLayer, repo_id: str, yearmo: str)[source]¶

Compute RepositoryInfo message

Note that default behavior for first/last msg_iso_date_aggregation() is to use the same msg item for IsoDateAggregation if only one msg item exists for a given yearmo. An empty message except for repo_id and yearmo will be returned if no commits exist for a given yearmo.

Truck Factor¶

Compute Truck Factor

The truck factor assignment includes several factors. We’re just going to focus on the actual truck factor computation from a database in this file.

References:: Assignment: http://janvitek.org/events/NEU/6050/a4.html Paper: http://janvitek.org/events/NEU/6050/Ps/truck.pdf

okra.assn4.author_file_owned(owner, project, dal)[source]¶

Compute file ownership by each author.

Parameters

owner – name of owner
project – name of project
dal – okra.models.DataAccessLayer

Returns

list of file owners based on max number of lines written

Return type

list of sqlalchemy objects

okra.assn4.author_number_of_files_owned(results)[source]¶

Number of files owned by author.

Parameters: results – results from author_file_owned()
Returns: {author: number of files owned}
Return type: dict

okra.assn4.get_truck_factor_by_project(owner, project, dal)[source]¶

Get the ‘truck factor’ by project.

For each project, and each file, compute how many lines were added by each unique user.
For each project, and each file, find which user created the file.
Given the above two results compute the ownership of each file.
For each project, and each file pick an owner.
For each project, rank the users by the number of files they own.
Given all of the above compute the Truck Factor as the smallest set of users such that they own more than half of the files in the project

Parameters

owner – name of owner
project – name of project
dal – okra.models.DataAccessLayer

Returns

Truck factor score for a GitHub project, Truck set members

Return type

tuple (int, list)

okra.assn4.smallest_owner_set(authors, total, size=0.5)[source]¶

Smallest set of authors owning more than half of project files.

Parameters

authors – author_number_of_files_owned() output
total – total_number_of_files_by_project() output

Returns

(number of members in smallest set, smallest set)

Return type

tuple

okra.assn4.total_number_of_contributors_by_project(owner, project, dal)[source]¶

Compute the total number of contributors by project.

Parameters

owner – name of owner
project – name of project
dal – okra.models.DataAccessLayer

Returns

the total number of contributors in a project

Return type

int

okra.assn4.total_number_of_files_by_project(owner, project, dal)[source]¶

Compute the total number of files by project.

Parameters

owner – name of owner
project – name of project
dal – okra.models.DataAccessLayer

Returns

the number of files in a project

Return type

int