Fix slow and non-deterministic behavior of isspace() and tolower()

Message ID 20190609151704.16061-1-shawn@git.icu
State New
Headers show
Series
  • Fix slow and non-deterministic behavior of isspace() and tolower()
Related show

Commit Message

Shawn Landden June 9, 2019, 3:17 p.m.
I was getting 8% and 6% cpu usage in tolower() and isspace(),
respectively, waiting for a breakpoint on ppc64el.

Also, gdb doesn't want non-deterministic behavior here.
---
 gdb/utils.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

-- 
2.20.1

Comments

Eli Zaretskii June 9, 2019, 3:42 p.m. | #1
> From: Shawn Landden <shawn@git.icu>

> Cc: Shawn Landden <shawn@git.icu>

> Date: Sun,  9 Jun 2019 10:17:04 -0500

> 

> I was getting 8% and 6% cpu usage in tolower() and isspace(),

> respectively, waiting for a breakpoint on ppc64el.

> 

> Also, gdb doesn't want non-deterministic behavior here.


Instead of inventing our own wheel, how about using the Gnulib's
c-ctype module?
Shawn Landden June 9, 2019, 3:51 p.m. | #2
On Sun, Jun 9, 2019 at 10:42 AM Eli Zaretskii <eliz@gnu.org> wrote:
>

> > From: Shawn Landden <shawn@git.icu>

> > Cc: Shawn Landden <shawn@git.icu>

> > Date: Sun,  9 Jun 2019 10:17:04 -0500

> >

> > I was getting 8% and 6% cpu usage in tolower() and isspace(),

> > respectively, waiting for a breakpoint on ppc64el.

> >

> > Also, gdb doesn't want non-deterministic behavior here.

>

> Instead of inventing our own wheel, how about using the Gnulib's

> c-ctype module?

The tolower() really should only apply to languages that need it, like Ada.
It is kinda a bug to use it with C, which is case sensitive. Also we
probably don't need
the full isspace() either, and these are simple functions. I think it
is much better to be able to
see the full algorithm right in front of you. (even if I just did the
base minimum to speed this up,
without having to study it).

-Shawn Landden
John Baldwin June 10, 2019, 9:13 p.m. | #3
On 6/9/19 8:17 AM, Shawn Landden wrote:
> I was getting 8% and 6% cpu usage in tolower() and isspace(),

> respectively, waiting for a breakpoint on ppc64el.

> 

> Also, gdb doesn't want non-deterministic behavior here.


These routines are not always macros.  For example, FreeBSD's ctype.h
exposes these as functions for C++ and only as macros for pure C.
Perhaps it would be better to define custom functions that don't use
the POSIX names to avoid clashing with whatever libc provides if we
want something with very specific behavior.

-- 
John Baldwin

Patch

diff --git a/gdb/utils.c b/gdb/utils.c
index 9686927473..d36b942cc5 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2626,10 +2626,29 @@  strcmp_iw (const char *string1, const char *string2)
    user searches for "foo", then strcmp will sort "foo" before "foo$".
    Then lookup_partial_symbol will notice that strcmp_iw("foo$",
    "foo") is false, so it won't proceed to the actual match of
    "foo(int)" with "foo".  */
 
+/* glibc versions of these have non-deterministic locale-dependant behavior,
+   and are very slow, taking 8% and 6% of total CPU time with some use-cases */
+#undef isspace
+static int isspace(int c)
+{
+  return c == ' ' || (unsigned)c-'\t' < 5;
+}
+#undef isupper
+static int isupper(int c)
+{
+  return (unsigned)c-'A' < 26;
+}
+#undef tolower
+static int tolower(int c)
+{
+  if (isupper(c)) return c | 32;
+  return c;
+}
+
 int
 strcmp_iw_ordered (const char *string1, const char *string2)
 {
   const char *saved_string1 = string1, *saved_string2 = string2;
   enum case_sensitivity case_pass = case_sensitive_off;