Applications¶
We’ve had to implement applications for homework and a class project. They are included here.
Okra API Batch Job¶
Generate tables for the Okra-API management command
Strategy here is to check a directory for updates, then consolidate those into parquet files with datetime in file name. Those parquet files are then uploaded into the Okra API.
Metrics are targeting the ‘proto/okra_api.proto’ interface.
-
okra.mgmt_api.
msg_iso_date_aggregation
(msg, item, status: str, yearmo: str)[source]¶ Compute IsoDateAggregation message
-
okra.mgmt_api.
msg_repository_info
(dal: okra.models.DataAccessLayer, repo_id: str, yearmo: str)[source]¶ Compute RepositoryInfo message
Note that default behavior for first/last msg_iso_date_aggregation() is to use the same msg item for IsoDateAggregation if only one msg item exists for a given yearmo. An empty message except for repo_id and yearmo will be returned if no commits exist for a given yearmo.
Truck Factor¶
Compute Truck Factor
The truck factor assignment includes several factors. We’re just going to focus on the actual truck factor computation from a database in this file.
- References:
Assignment: http://janvitek.org/events/NEU/6050/a4.html Paper: http://janvitek.org/events/NEU/6050/Ps/truck.pdf
Compute file ownership by each author.
- Parameters
owner – name of owner
project – name of project
dal – okra.models.DataAccessLayer
- Returns
list of file owners based on max number of lines written
- Return type
list of sqlalchemy objects
Number of files owned by author.
- Parameters
results – results from author_file_owned()
- Returns
{author: number of files owned}
- Return type
dict
-
okra.assn4.
get_truck_factor_by_project
(owner, project, dal)[source]¶ Get the ‘truck factor’ by project.
For each project, and each file, compute how many lines were added by each unique user.
For each project, and each file, find which user created the file.
Given the above two results compute the ownership of each file.
For each project, and each file pick an owner.
For each project, rank the users by the number of files they own.
Given all of the above compute the Truck Factor as the smallest set of users such that they own more than half of the files in the project
- Parameters
owner – name of owner
project – name of project
dal – okra.models.DataAccessLayer
- Returns
Truck factor score for a GitHub project, Truck set members
- Return type
tuple (int, list)
-
okra.assn4.
smallest_owner_set
(authors, total, size=0.5)[source]¶ Smallest set of authors owning more than half of project files.
- Parameters
authors – author_number_of_files_owned() output
total – total_number_of_files_by_project() output
- Returns
(number of members in smallest set, smallest set)
- Return type
tuple