build-many-glibcs.py: Use --depth 1 in Git checkout

Message ID 877e06udhu.fsf@oldenburg2.str.redhat.com
State New
Headers show
Series
  • build-many-glibcs.py: Use --depth 1 in Git checkout
Related show

Commit Message

Florian Weimer Feb. 28, 2020, 12:01 p.m.
The history is not used by build-many-glibcs.py itself.
--replace-sources deletes an existing source tree before switching
the version.

This works around networking and Git server performance issues to some
degree.

-----
 scripts/build-many-glibcs.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Joseph Myers Feb. 28, 2020, 9:19 p.m. | #1
On Fri, 28 Feb 2020, Florian Weimer wrote:

> The history is not used by build-many-glibcs.py itself.

> --replace-sources deletes an existing source tree before switching

> the version.

> 

> This works around networking and Git server performance issues to some

> degree.


I think making a shallow clone is unhelpful in practice since it's 
routinely useful to revert to previous versions when finding where a 
problem appeared.

-- 
Joseph S. Myers
joseph@codesourcery.com
Florian Weimer Feb. 28, 2020, 9:33 p.m. | #2
* Joseph Myers:

> On Fri, 28 Feb 2020, Florian Weimer wrote:

>

>> The history is not used by build-many-glibcs.py itself.

>> --replace-sources deletes an existing source tree before switching

>> the version.

>> 

>> This works around networking and Git server performance issues to some

>> degree.

>

> I think making a shallow clone is unhelpful in practice since it's 

> routinely useful to revert to previous versions when finding where a 

> problem appeared.


Would it be too cumbersome to run git fetch --unshallow in this case,
before starting the investigation?

It really reduces checkout time, particularly if sourceware.org is under
load.

Thanks,
Florian
Joseph Myers Feb. 28, 2020, 9:46 p.m. | #3
On Fri, 28 Feb 2020, Florian Weimer wrote:

> Would it be too cumbersome to run git fetch --unshallow in this case,

> before starting the investigation?

> 

> It really reduces checkout time, particularly if sourceware.org is under

> load.


Once the new sourceware is up and running, maybe we should work with 
overseers on (a) ensuring all active repositories have repack.writeBitmaps 
set to true, (b) doing repacks of repositories using suitable aggressive 
very-large-repository settings as used for GCC (git repack --window=1250 
--depth=250 -b -AdFfi) and (c) setting up cron jobs to do a more 
lightweight packing with bitmaps (Richard Earnshaw says git repack 
--window-memory=500m --window=250 --depth=50 -b -A -d -i) on all 
repositories, say weekly.  The effect of such better packing of 
repositories with bitmaps should be to speed up checkouts.

-- 
Joseph S. Myers
joseph@codesourcery.com
Florian Weimer Feb. 29, 2020, 1:35 p.m. | #4
* Joseph Myers:

> On Fri, 28 Feb 2020, Florian Weimer wrote:

>

>> Would it be too cumbersome to run git fetch --unshallow in this case,

>> before starting the investigation?

>> 

>> It really reduces checkout time, particularly if sourceware.org is under

>> load.

>

> Once the new sourceware is up and running, maybe we should work with 

> overseers on (a) ensuring all active repositories have repack.writeBitmaps 

> set to true, (b) doing repacks of repositories using suitable aggressive 

> very-large-repository settings as used for GCC (git repack --window=1250 

> --depth=250 -b -AdFfi) and (c) setting up cron jobs to do a more 

> lightweight packing with bitmaps (Richard Earnshaw says git repack 

> --window-memory=500m --window=250 --depth=50 -b -A -d -i) on all 

> repositories, say weekly.  The effect of such better packing of 

> repositories with bitmaps should be to speed up checkouts.


It won't speed up the local verification step of incoming packs.  But
given the wildly varying sourceware.org load (20 to over 50 minutes for
the checkout step), I have trouble getting accurate numbers.

The other issue is the size of build images containing the checkout.
There, the difference is 2.8 GiB vs 4.2 GiB.

Thanks,
Florian

8<------------------------------------------------------------------8<
Subject: build-many-glibcs.py: Add --shallow option

The history is not used by build-many-glibcs.py itself.
--replace-sources deletes an existing source tree before switching
the version.  But some users prefer to have the full history
available, therefore make shallow clones optional with the --shallow
option.

-----
 scripts/build-many-glibcs.py | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 6add364720..c822f3b588 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -80,7 +80,7 @@ class Context(object):
     """The global state associated with builds in a given directory."""
 
     def __init__(self, topdir, parallelism, keep, replace_sources, strip,
-                 full_gcc, action):
+                 full_gcc, action, shallow=False):
         """Initialize the context."""
         self.topdir = topdir
         self.parallelism = parallelism
@@ -88,6 +88,7 @@ class Context(object):
         self.replace_sources = replace_sources
         self.strip = strip
         self.full_gcc = full_gcc
+        self.shallow = shallow
         self.srcdir = os.path.join(topdir, 'src')
         self.versions_json = os.path.join(self.srcdir, 'versions.json')
         self.build_state_json = os.path.join(topdir, 'build-state.json')
@@ -852,7 +853,12 @@ class Context(object):
             subprocess.run(['git', 'pull', '-q'],
                            cwd=self.component_srcdir(component), check=True)
         else:
-            subprocess.run(['git', 'clone', '-q', '-b', git_branch, git_url,
+            if self.shallow:
+                depth_arg = ('--depth', '1')
+            else:
+                depth_arg = ()
+            subprocess.run(['git', 'clone', '-q', '-b', git_branch,
+                            *depth_arg, git_url,
                             self.component_srcdir(component)], check=True)
         r = subprocess.run(['git', 'rev-parse', 'HEAD'],
                            cwd=self.component_srcdir(component),
@@ -1771,6 +1777,8 @@ def get_parser():
                         help='Strip installed glibc libraries')
     parser.add_argument('--full-gcc', action='store_true',
                         help='Build GCC with all languages and libsanitizer')
+    parser.add_argument('--shallow', action='store_true',
+                        help='Do not download Git history during checkout')
     parser.add_argument('topdir',
                         help='Toplevel working directory')
     parser.add_argument('action',
@@ -1790,7 +1798,8 @@ def main(argv):
     opts = parser.parse_args(argv)
     topdir = os.path.abspath(opts.topdir)
     ctx = Context(topdir, opts.parallelism, opts.keep, opts.replace_sources,
-                  opts.strip, opts.full_gcc, opts.action)
+                  opts.strip, opts.full_gcc, opts.action,
+                  shallow=opts.shallow)
     ctx.run_builds(opts.action, opts.configs)
Joseph Myers March 2, 2020, 10:08 p.m. | #5
On Sat, 29 Feb 2020, Florian Weimer wrote:

> Subject: build-many-glibcs.py: Add --shallow option


OK.

-- 
Joseph S. Myers
joseph@codesourcery.com
Andreas Schwab March 3, 2020, 8:47 a.m. | #6
On Feb 28 2020, Florian Weimer wrote:

> It really reduces checkout time, particularly if sourceware.org is under

> load.


Why can't you just use a local mirror?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Florian Weimer March 3, 2020, 9:23 a.m. | #7
* Andreas Schwab:

> On Feb 28 2020, Florian Weimer wrote:

>

>> It really reduces checkout time, particularly if sourceware.org is under

>> load.

>

> Why can't you just use a local mirror?


I was trying to push builds out to a cluster of machines, and I don't
know where local is for each machine.

Thanks,
Florian
Andreas Schwab March 3, 2020, 9:31 a.m. | #8
On Mär 03 2020, Florian Weimer wrote:

> * Andreas Schwab:

>

>> On Feb 28 2020, Florian Weimer wrote:

>>

>>> It really reduces checkout time, particularly if sourceware.org is under

>>> load.

>>

>> Why can't you just use a local mirror?

>

> I was trying to push builds out to a cluster of machines, and I don't

> know where local is for each machine.


You can make it anywhere you like.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Florian Weimer March 3, 2020, 2:46 p.m. | #9
* Andreas Schwab:

> On Mär 03 2020, Florian Weimer wrote:

>

>> * Andreas Schwab:

>>

>>> On Feb 28 2020, Florian Weimer wrote:

>>>

>>>> It really reduces checkout time, particularly if sourceware.org is under

>>>> load.

>>>

>>> Why can't you just use a local mirror?

>>

>> I was trying to push builds out to a cluster of machines, and I don't

>> know where local is for each machine.

>

> You can make it anywhere you like.


Sure, and with --shallow, it's a little bit faster even.

Thanks,
Florian

Patch

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 6add364720..d6f66a8216 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -852,7 +852,8 @@  class Context(object):
             subprocess.run(['git', 'pull', '-q'],
                            cwd=self.component_srcdir(component), check=True)
         else:
-            subprocess.run(['git', 'clone', '-q', '-b', git_branch, git_url,
+            subprocess.run(['git', 'clone', '-q', '-b', git_branch,
+                            '--depth', '1', git_url,
                             self.component_srcdir(component)], check=True)
         r = subprocess.run(['git', 'rev-parse', 'HEAD'],
                            cwd=self.component_srcdir(component),