[2/9] Introduce and use language_set

Message ID 20190307205709.21919-3-tom@tromey.com
State New
Headers show
Series
  • Minor minimal symbol improvements
Related show

Commit Message

Tom Tromey March 7, 2019, 8:57 p.m.
I noticed that objfile_per_bfd_storage::demangled_hash_languages is a
std::vector, which seemed quite large for something that,
fundamentally, can be represented as a bit-set.  This patch
reimplements it as a bit-set.

gdb/ChangeLog
2019-03-07  Tom Tromey  <tom@tromey.com>

	* objfiles.h (struct objfile_per_bfd_storage)
	<demangled_hash_languages>: Now a language_set.
	* minsyms.c (add_minsym_to_demangled_hash_table): Update.
	* language.h (class language_set): New.
---
 gdb/ChangeLog  |  7 +++++
 gdb/language.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gdb/minsyms.c  |  7 ++---
 gdb/objfiles.h |  6 ++--
 4 files changed, 95 insertions(+), 9 deletions(-)

-- 
2.17.2

Comments

John Baldwin March 7, 2019, 11:04 p.m. | #1
On 3/7/19 12:57 PM, Tom Tromey wrote:
> I noticed that objfile_per_bfd_storage::demangled_hash_languages is a

> std::vector, which seemed quite large for something that,

> fundamentally, can be represented as a bit-set.  This patch

> reimplements it as a bit-set.


Hmm, did you consider using std::bitset<nr_languages> for langauge_set?  You'd
still have to write your own iterator class, so it wouldn't save much in terms
of lines of code.  It just seems a bit odd to write a bitset class from scratch?

-- 
John Baldwin
Tom Tromey March 8, 2019, 10:20 p.m. | #2
>>>>> "John" == John Baldwin <jhb@FreeBSD.org> writes:


John> On 3/7/19 12:57 PM, Tom Tromey wrote:
>> I noticed that objfile_per_bfd_storage::demangled_hash_languages is a

>> std::vector, which seemed quite large for something that,

>> fundamentally, can be represented as a bit-set.  This patch

>> reimplements it as a bit-set.


John> Hmm, did you consider using std::bitset<nr_languages> for langauge_set?  You'd
John> still have to write your own iterator class, so it wouldn't save much in terms
John> of lines of code.  It just seems a bit odd to write a bitset class from scratch?

I didn't really consider it.

I looked now and on my machine, std::bitset<nr_languages> is 8 bytes,
whereas langauge_set is just 4.  So, with the custom implementation, I
could pack it into a hole in objfile_per_bfd_storage.

That said, due to the minsym arrays, objfile_per_bfd_storage is
enormous, so saving 4 bytes probably isn't important.  So, I'll redo
this.

Tom
Tom Tromey March 8, 2019, 10:43 p.m. | #3
John> Hmm, did you consider using std::bitset<nr_languages> for langauge_set?  You'd
John> still have to write your own iterator class, so it wouldn't save much in terms
John> of lines of code.

In this case, because there are just two spots that iterate over this
set, and because we'd need to introduce somewhat ugly casts back to
"enum language" anyway, it seemed just as good to not write an iterator
and just do this in a simpler way.

Tom

commit d52ac096d7ae33144eb3a691a78e061d32833444
Author: Tom Tromey <tom@tromey.com>
Date:   Fri Mar 1 19:55:46 2019 -0700

    Use bitset for demangled_hash_languages
    
    I noticed that objfile_per_bfd_storage::demangled_hash_languages is a
    std::vector, which seemed quite large for something that,
    fundamentally, can be represented as a bitset.  This patch
    reimplements it as a std::bitset.
    
    gdb/ChangeLog
    2019-03-08  Tom Tromey  <tom@tromey.com>
    
            * objfiles.h (struct objfile_per_bfd_storage)
            <demangled_hash_languages>: Now a bitset.
            * minsyms.c (add_minsym_to_demangled_hash_table): Update.
            (lookup_minimal_symbol): Update.

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 7ec81d36571..4c952b0a445 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,10 @@
+2019-03-08  Tom Tromey  <tom@tromey.com>
+
+	* objfiles.h (struct objfile_per_bfd_storage)
+	<demangled_hash_languages>: Now a bitset.
+	* minsyms.c (add_minsym_to_demangled_hash_table): Update.
+	(lookup_minimal_symbol): Update.
+
 2019-03-07  Tom Tromey  <tom@tromey.com>
 
 	* minsyms.h (class minimal_symbol_reader) <record_with_info>:
diff --git a/gdb/minsyms.c b/gdb/minsyms.c
index 0513cfe69f4..cbb45f141c9 100644
--- a/gdb/minsyms.c
+++ b/gdb/minsyms.c
@@ -160,11 +160,7 @@ add_minsym_to_demangled_hash_table (struct minimal_symbol *sym,
       unsigned int hash = search_name_hash (MSYMBOL_LANGUAGE (sym),
 					    MSYMBOL_SEARCH_NAME (sym));
 
-      auto &vec = objfile->per_bfd->demangled_hash_languages;
-      auto it = std::lower_bound (vec.begin (), vec.end (),
-				  MSYMBOL_LANGUAGE (sym));
-      if (it == vec.end () || *it != MSYMBOL_LANGUAGE (sym))
-	vec.insert (it, MSYMBOL_LANGUAGE (sym));
+      objfile->per_bfd->demangled_hash_languages.set (MSYMBOL_LANGUAGE (sym));
 
       struct minimal_symbol **table
 	= objfile->per_bfd->msymbol_demangled_hash;
@@ -354,8 +350,12 @@ lookup_minimal_symbol (const char *name, const char *sfile,
 	    {
 	      /* Once for each language in the demangled hash names
 		 table (usually just zero or one languages).  */
-	      for (auto lang : objfile->per_bfd->demangled_hash_languages)
+	      for (unsigned iter = 0; iter < nr_languages; ++iter)
 		{
+		  if (!objfile->per_bfd->demangled_hash_languages.test (iter))
+		    continue;
+		  enum language lang = (enum language) iter;
+
 		  unsigned int hash
 		    = (lookup_name.search_name_hash (lang)
 		       % MINIMAL_SYMBOL_HASH_SIZE);
@@ -497,8 +497,12 @@ iterate_over_minimal_symbols
   /* The second pass is over the demangled table.  Once for each
      language in the demangled hash names table (usually just zero or
      one).  */
-  for (auto lang : objf->per_bfd->demangled_hash_languages)
+  for (unsigned liter = 0; liter < nr_languages; ++liter)
     {
+      if (!objf->per_bfd->demangled_hash_languages.test (liter))
+	continue;
+
+      enum language lang = (enum language) liter;
       const language_defn *lang_def = language_def (lang);
       symbol_name_matcher_ftype *name_match
 	= get_symbol_name_matcher (lang_def, lookup_name);
diff --git a/gdb/objfiles.h b/gdb/objfiles.h
index c5ce9eec955..47df0023dcc 100644
--- a/gdb/objfiles.h
+++ b/gdb/objfiles.h
@@ -28,6 +28,7 @@
 #include "registry.h"
 #include "gdb_bfd.h"
 #include "psymtab.h"
+#include <bitset>
 #include <vector>
 #include "common/next-iterator.h"
 #include "common/safe-iterator.h"
@@ -313,10 +314,8 @@ struct objfile_per_bfd_storage
   minimal_symbol *msymbol_demangled_hash[MINIMAL_SYMBOL_HASH_SIZE] {};
 
   /* All the different languages of symbols found in the demangled
-     hash table.  A flat/vector-based map is more efficient than a map
-     or hash table here, since this will only usually contain zero or
-     one entries.  */
-  std::vector<enum language> demangled_hash_languages;
+     hash table.  */
+  std::bitset<nr_languages> demangled_hash_languages;
 };
 
 /* Master structure for keeping track of each file from which
John Baldwin March 8, 2019, 11:43 p.m. | #4
On 3/8/19 2:43 PM, Tom Tromey wrote:
> John> Hmm, did you consider using std::bitset<nr_languages> for langauge_set?  You'd

> John> still have to write your own iterator class, so it wouldn't save much in terms

> John> of lines of code.

> 

> In this case, because there are just two spots that iterate over this

> set, and because we'd need to introduce somewhat ugly casts back to

> "enum language" anyway, it seemed just as good to not write an iterator

> and just do this in a simpler way.


Looks good to me.

-- 
John Baldwin

Patch

diff --git a/gdb/language.h b/gdb/language.h
index d56ec200208..d57eee96a12 100644
--- a/gdb/language.h
+++ b/gdb/language.h
@@ -747,4 +747,88 @@  private:
   enum language m_lang;
 };
 
+/* A set object specialized for keeping track of languages.  */
+
+class language_set
+{
+public:
+
+  language_set ()
+  {
+  }
+
+  /* Insert LANG into this set.  */
+  void insert (enum language lang)
+  {
+    m_mask |= 1 << lang;
+  }
+
+  /* The underlying type used to represent the languages.  */
+  typedef unsigned mask_type;
+
+  /* An iterator that iterates over languages contained in a language
+     set.  */
+  class iterator
+  {
+  public:
+
+    iterator (unsigned int n, mask_type mask)
+      : m_index (n),
+	m_mask (mask)
+    {
+      gdb_assert (m_index <= nr_languages);
+      update ();
+    }
+
+    enum language operator* () const
+    {
+      return (enum language) m_index;
+    }
+
+    iterator &operator++ ()
+    {
+      ++m_index;
+      update ();
+      return *this;
+    }
+
+    bool operator!= (const iterator &other) const
+    {
+      return m_index != other.m_index;
+    }
+
+  private:
+
+    /* A helper function that will advance M_INDEX to the next
+       language in the set.  */
+    void update ()
+    {
+      while (m_index < nr_languages && (m_mask & (1 << m_index)) == 0)
+	++m_index;
+    }
+
+    /* The current index of the iterator.  */
+    unsigned int m_index;
+    /* The mask from the underlying set.  */
+    mask_type m_mask;
+  };
+
+  iterator begin () const
+  {
+    return iterator (0, m_mask);
+  }
+
+  iterator end () const
+  {
+    return iterator (nr_languages, m_mask);
+  }
+
+private:
+
+  /* A bit mask of languages which are included in this set.  */
+  mask_type m_mask = 0;
+
+  gdb_static_assert (nr_languages <= (sizeof (mask_type) * CHAR_BIT));
+};
+
 #endif /* defined (LANGUAGE_H) */
diff --git a/gdb/minsyms.c b/gdb/minsyms.c
index 0513cfe69f4..6395cc4ccab 100644
--- a/gdb/minsyms.c
+++ b/gdb/minsyms.c
@@ -160,11 +160,8 @@  add_minsym_to_demangled_hash_table (struct minimal_symbol *sym,
       unsigned int hash = search_name_hash (MSYMBOL_LANGUAGE (sym),
 					    MSYMBOL_SEARCH_NAME (sym));
 
-      auto &vec = objfile->per_bfd->demangled_hash_languages;
-      auto it = std::lower_bound (vec.begin (), vec.end (),
-				  MSYMBOL_LANGUAGE (sym));
-      if (it == vec.end () || *it != MSYMBOL_LANGUAGE (sym))
-	vec.insert (it, MSYMBOL_LANGUAGE (sym));
+      objfile->per_bfd->demangled_hash_languages.insert
+	(MSYMBOL_LANGUAGE (sym));
 
       struct minimal_symbol **table
 	= objfile->per_bfd->msymbol_demangled_hash;
diff --git a/gdb/objfiles.h b/gdb/objfiles.h
index c5ce9eec955..843c44e1d24 100644
--- a/gdb/objfiles.h
+++ b/gdb/objfiles.h
@@ -313,10 +313,8 @@  struct objfile_per_bfd_storage
   minimal_symbol *msymbol_demangled_hash[MINIMAL_SYMBOL_HASH_SIZE] {};
 
   /* All the different languages of symbols found in the demangled
-     hash table.  A flat/vector-based map is more efficient than a map
-     or hash table here, since this will only usually contain zero or
-     one entries.  */
-  std::vector<enum language> demangled_hash_languages;
+     hash table.  */
+  language_set demangled_hash_languages;
 };
 
 /* Master structure for keeping track of each file from which