GSoC Week 6 & week 7 β Diving deep into GitLab and Gitaly!
Well Hello friend ππ»
I was not in capacity to publish my week 6 blogs as I was not able to focus much on work due to health issues. But, now I am feeling great π¦Ύ. Now, letβs talk about project. So far in Git, my patch for adding support to mailmap in git-cat-file
is merged in git next
branch π and will soon be promoted to master
π. Thanks a ton Junio, Phillip, ΔoΓ n, Γvar, Johannes, Christian and John for helping me with the reviews and making the patch better. Here is the link to the patch https://public-inbox.org/git/20220718195102.66321-1-siddharthasthana31@gmail.com/.
The Mid-Term Evaluation π
This first month of GSoC was very exciting! I am also very happy that I have passed my GSoC midterm evaluation and got my first stipend π΅. Probably will buy myself a Green lightsaber βπ§βοΈ.
So, Now letβs talk about the things in GitLab and Gitaly that I have been working on this week!
Contributors Graph π
As mentioned in my previous blog, I had share some of my finding related to contributors graph, where GitLab was using FindCommit
RPC when contributorβs graph is loaded. I tried to dig deep into GitLab side of project to find out how GitLab and gitaly are interacting. I approached the search systematically,
My approach was to first find out the routes which the contribution graph page is on, I visited the GitLab routes page http://localhost:3000/rails/info/routes where I find out contribution graph is using
/*namespace_id/:project_id/-/graphs/:id
path andprojects/graphs#show
controller#Action. Now, I know when we visit the contribution graph page the controller that is called isshow()
and is defined ingraph_controller.rb
.Following is the snippet of the
show
function ingraph_controller.rb
. The most interesting happening here is the call tofetch_graph()
.1
2
3
4
5
6
7
8def show
respond_to do |format|
format.html
format.json do
fetch_graph
end
end
endFollowing is the snippet of the
fetch_graph
function.1
2
3
4
5
6
7
8
9
10
11
12
13
14def fetch_graph
@commits = @project.repository.commits(@ref, limit: 6000, skip_merges: true)
@log = []
@commits.each do |commit|
@log << {
author_name: commit.author_name,
author_email: commit.author_email,
date: commit.committed_date.strftime("%Y-%m-%d")
}
end
render json: @log.to_json
endThe first line of the function, makes a call to a function called
commits
and passesref
(which is master),limit:6000
andskip_merges:true
. Thecommits
function is defined inapp/models/repository.rb
.Following is the snippet of the
commits
function1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24def commits(ref = nil, opts = {})
options = {
repo: raw_repository,
ref: ref,
path: opts[:path],
author: opts[:author],
follow: Array(opts[:path]).length == 1,
limit: opts[:limit],
offset: opts[:offset],
skip_merges: !!opts[:skip_merges],
after: opts[:after],
before: opts[:before],
all: !!opts[:all],
first_parent: !!opts[:first_parent],
order: opts[:order],
literal_pathspec: opts.fetch(:literal_pathspec, true),
trailers: opts[:trailers]
}
commits = Gitlab::Git::Commit.where(options)
commits = Commit.decorate(commits, container) if commits.present?
CommitCollection.new(container, commits, ref)
endso, the first thing we are doing here is to create the options to be issued in the git command. The arguments that we passed are used to set the corresponding options, and the options in our case will look like the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15repo ==> <Gitlab::Git::Repository: flightjs/Flight>
ref ==> master
path ==>
author ==>
follow ==> true
limit ==> 6000
offset ==> 40
skip_merges ==> true
after ==>
before ==>
all ==> false
first_parent ==> false
order ==>
literal_pathspec ==> true
trailers ==>Then we make a call to
Gitlab::Git::Commit::where (options)
, which is defined inlib/gitlab/git/commit.rb
. In thewhere
function, we make a call tolog
function defined inlib/gitlab/git/repository.rb
Following is a snippet of the
log
function1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23def log(options)
default_options = {
limit: 10,
offset: 0,
path: nil,
author: nil,
follow: false,
skip_merges: false,
after: nil,
before: nil,
all: false
}
options = default_options.merge(options)
options[:offset] ||= 0
limit = options[:limit]
if limit == 0 || !limit.is_a?(Integer)
raise ArgumentError, "invalid Repository#log limit: #{limit.inspect}"
end
wrapped_gitaly_errors do
gitaly_commit_client.find_commits(options)
end
endas we can see, we are again updating the options and making a very interesting function call,
gitaly_commit_client.find_commits(options)
. This is the call to a function calledfind_commits
defined in the gitaly client.Following is the snippet of
find_commits
function:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23def find_commits(options)
request = Gitaly::FindCommitsRequest.new(
repository: @gitaly_repo,
limit: options[:limit],
offset: options[:offset],
follow: options[:follow],
skip_merges: options[:skip_merges],
all: !!options[:all],
first_parent: !!options[:first_parent],
global_options: parse_global_options!(options),
disable_walk: true, # This option is deprecated. The 'walk' implementation is being removed. trailers: options[:trailers]
)
request.after = GitalyClient.timestamp(options[:after]) if options[:after]
request.before = GitalyClient.timestamp(options[:before]) if options[:before]
request.revision = encode_binary(options[:ref]) if options[:ref]
request.author = encode_binary(options[:author]) if options[:author]
request.order = options[:order].upcase.sub('DEFAULT', 'NONE') if options[:order].present?
request.paths = encode_repeated(Array(options[:path])) if options[:path].present?
response = GitalyClient.call(@repository.storage, :commit_service, :find_commits, request, timeout: GitalyClient.medium_timeout)
consume_commits_response(response)
endThe
call
function here sends the request to Gitaly and invokes theFindCommits
RPC there. This is how the RPC is called from GitLab. Gitaly will respond with the information from the commit objects which is further processed by GitLab and the contributors graph is generated!
Now, how does Gitaly extract information from the bare git repositories it interacts with?
To understand that, letβs talk about a very important git plumbing command, cat-file
. The command provides content or type and size information for repository objects. For example, we can execute git cat-file -p HEAD
, and we will be getting all the information about the HEAD
commit object. GitLab makes extensive use of this command across its features. We have an option for this command called--batch
. This enables us to print object information and contents for each object provided on stdin. So, Gitaly keeps a git cat-file --batch
process running. So, all we have to do is give this process the revisions, and it will provide content as per the type of the revision. So, in the FindCommits
RPC, we first get all the revisions by issuing a git-log
command along with all the options that we received from GitLab. Now that we have the revisions, we pass them to the stdin of the git cat-file --batch
process and stream the information to GitLab.
Now that we know that we use git-cat-file
to get information for constructing the contributors graph, we can just make use of my patches adding mailmap support to git-cat-file
. But, first we must benchmark and understand if my patches will incur any performance issues. My mentors suggested me to use hyperfine
tool to benchmark the performance of git-cat-file
with and without --use-mailmap
option.
Benchmarking git-cat-file
In order to benchmark and compare the performance of git-cat-file
with and without --use-mailmap
option in the --batch
mode, I created two shell scripts.
The first one was just to invoke git cat-file --batch
, it was named [cat-file.sh](http://cat-file.sh)
and looks like following:
1 |
|
The second one was to invoke git cat-file --use-mailmap --batch
, it was named [cat-file-mailmap.sh](http://cat-file-mailmap.sh)
, and it looked like following:
1 |
|
Then, I just passed these shell scripts to hyperfine
for benchmarking using the following command:
1 | hyperfine -N './cat-file.sh' './cat-file-mailmap.sh' |
and got the following result
Comparing both the run for checking any performance implication. The benchmark test that shows using cat-file
with in --batch
is 1.02 times faster than without --use-mailmap
, which I donβt think is much performance difference. But, I am waiting for my mentors suggestions about this analysis.
So, my next task will be to:
- Work on Gitaly side of project to make Gitaly use
--use-mailmap
implemented ingit cat-file
https://gitlab.com/gitlab-org/gitaly/-/issues/4364.
So yeah, that was the week 6 & 7. Thanks a lot for reading π
Will be back next week with another blog, Peace! βπ»