Skip to content

WIP: Extend SourceLocation to 64 bits. #146314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft

Conversation

hokein
Copy link
Collaborator

@hokein hokein commented Jun 30, 2025

See discussion: https://discourse.llvm.org/t/revisiting-64-bit-source-locations/86556

This patch extends the SourceLocation structure from 4 bytes to 8 bytes. For now, only the lower 40 bits are used, leaving 24 bits reserved for future use.
In theory, we could use up to 48 bits limited by the current encoding scheme, where a SourceLocation is represented as a 64-bit pair <ModuleFileIndex, SourceLocationOffset>, with ModuleFileIndex taking 16 bits. Starting with 40 bits should be sufficient, and the limit can be easily adjusted later if needed.

AST-related changes

  • moved SourceLocation out of some AST nodes’ StmtBitFields, as the bitfields no longer have enough space.
  • moved the source range of CXXOperatorNameExpr to the heap to avoid increasing the size of DeclarationNameLoc. This minimizes the growth of frequently used nodes like DeclRefExpr, which now increases from 32 to 40 bytes.
  • moved NumArgs into CallExpr’s bitfields (20 bits, which should be sufficient).

Performance

libclang

To avoid ABI breakage in libclang, we continue using 32-bit source locations. These are converted from the 64-bit locations with bounds checking. If the value doesn’t fit, an invalid source location is returned.

TODO

  • update documentation and inline comments related to SourceLocation.
  • split out the binary search optimization in getFileIDLocal into a separate patch.
  • report detailed size impact of AST node changes.
  • revisit the getExprLoc optimization for CallExpr ([Clang] Optimize some getBeginLoc implementations #141058). Currently, allocation size is increased from 32 to 40 bytes without further tuning.

hokein added 12 commits June 26, 2025 14:44
fix

Reduce the Stmt size back to 8 bytes.

Reduce the CallExpr size

Fix the ObjCContainerDecl bit field

Change the SourceLocation::UIntTy to uint64_t

Update other SourceManager's getDecomposedSpellingLoc APIs, and fix many
failing tests.

Remaining failures:

  Clang :: Index/IBOutletCollection.m
  Clang :: Index/annotate-macro-args.m
  Clang :: Index/annotate-module.m
  Clang :: Index/annotate-tokens-pp.c
  Clang :: Index/annotate-tokens.c
  Clang :: Index/annotate-toplevel-in-objccontainer.m
  Clang :: Index/hidden-redecls.m
  Clang :: Index/index-module-with-vfs.m
  Clang :: Index/index-module.m
  Clang :: Index/index-pch-objc.m
  Clang :: Index/index-pch-with-module.m
  Clang :: Index/index-pch.cpp
  Clang :: Index/targeted-annotation.c
  Clang :: Lexer/SourceLocationsOverflow.c
  Clang-Unit :: ./AllClangUnitTests/PPMemoryAllocationsTest/PPMacroDefinesAllocations
  Clang-Unit :: ./AllClangUnitTests/SourceLocationEncoding/Individual
  Clang-Unit :: ./AllClangUnitTests/SourceLocationEncoding/Sequence
  Clang-Unit :: libclang/./libclangTests/14/53
  Clang-Unit :: libclang/./libclangTests/45/53
  Clang-Unit :: libclang/./libclangTests/47/53
  Clang-Unit :: libclang/./libclangTests/48/53
  Clang-Unit :: libclang/./libclangTests/49/53
  Clang-Unit :: libclang/./libclangTests/50/53
  Clang-Unit :: libclang/./libclangTests/52/53

Fix libclang failures

Fix Rewrite APIs

Fix PPMemoryAllocationsTest

Fix SourceLocationEncodingTest

More unsigned -> SourceLocation::UIntTy changes in the SourceManager APIs

Update the type of std::pair<FileID, unsigned> in CIndex.cpp

Fix SourceLocationEncodingTest

Tweak the SourceLocation Implementation.

The source location has a Bit which specify the number of bits used
for the offset. 40 by default;

Make MathExtra templates constexpr

Test Bits=64 perf

Try 48 bits

No bitfields

Fix CallExpr optimization.
by moving out the two source locations for CXXOpName from DeclarationNameLoc
Copy link

github-actions bot commented Jun 30, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp clang-tools-extra/clangd/unittests/SourceCodeTests.cpp clang-tools-extra/include-cleaner/lib/HTMLReport.cpp clang/include/clang/AST/ASTContext.h clang/include/clang/AST/DeclBase.h clang/include/clang/AST/DeclObjC.h clang/include/clang/AST/DeclarationName.h clang/include/clang/AST/Expr.h clang/include/clang/AST/ExprCXX.h clang/include/clang/AST/ExprConcepts.h clang/include/clang/AST/ExternalASTSource.h clang/include/clang/AST/Stmt.h clang/include/clang/Basic/Diagnostic.h clang/include/clang/Basic/SourceLocation.h clang/include/clang/Basic/SourceManager.h clang/include/clang/Rewrite/Core/Rewriter.h clang/include/clang/Sema/MultiplexExternalSemaSource.h clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/SourceLocationEncoding.h clang/lib/AST/ASTContext.cpp clang/lib/AST/ASTImporter.cpp clang/lib/AST/DeclarationName.cpp clang/lib/AST/Expr.cpp clang/lib/AST/ExprCXX.cpp clang/lib/AST/ExprConcepts.cpp clang/lib/AST/ExternalASTSource.cpp clang/lib/AST/Stmt.cpp clang/lib/Basic/Diagnostic.cpp clang/lib/Basic/SourceLocation.cpp clang/lib/Basic/SourceManager.cpp clang/lib/Format/FormatTokenLexer.cpp clang/lib/Parse/ParseStmtAsm.cpp clang/lib/Rewrite/Rewriter.cpp clang/lib/Sema/MultiplexExternalSemaSource.cpp clang/lib/Sema/SemaDecl.cpp clang/lib/Sema/SemaLambda.cpp clang/lib/Sema/SemaOverload.cpp clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTReaderStmt.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Serialization/ASTWriterDecl.cpp clang/lib/Serialization/ASTWriterStmt.cpp clang/tools/libclang/CIndex.cpp clang/tools/libclang/CXIndexDataConsumer.cpp clang/tools/libclang/CXSourceLocation.cpp clang/tools/libclang/CXSourceLocation.h clang/tools/libclang/Indexing.cpp clang/unittests/Lex/PPMemoryAllocationsTest.cpp clang/unittests/Serialization/SourceLocationEncodingTest.cpp
View the diff from clang-format here.
diff --git a/clang/include/clang/AST/DeclObjC.h b/clang/include/clang/AST/DeclObjC.h
index 56fc4ba41..804215cb8 100644
--- a/clang/include/clang/AST/DeclObjC.h
+++ b/clang/include/clang/AST/DeclObjC.h
@@ -1093,9 +1093,7 @@ public:
 
   SourceLocation getAtStartLoc() const { return AtStart; }
 
-  void setAtStartLoc(SourceLocation Loc) {
-    AtStart = Loc;
-  }
+  void setAtStartLoc(SourceLocation Loc) { AtStart = Loc; }
 
   // Marks the end of the container.
   SourceRange getAtEndRange() const { return AtEnd; }
diff --git a/clang/include/clang/AST/DeclarationName.h b/clang/include/clang/AST/DeclarationName.h
index db7973bcc..bbb91fc14 100644
--- a/clang/include/clang/AST/DeclarationName.h
+++ b/clang/include/clang/AST/DeclarationName.h
@@ -703,7 +703,7 @@ class DeclarationNameLoc {
 
   // The location (if any) of the operator keyword is stored elsewhere.
   struct CXXOpName {
-    CXXOperatorSourceInfo* OInfo;
+    CXXOperatorSourceInfo *OInfo;
   };
 
   // The location (if any) of the operator keyword is stored elsewhere.
@@ -740,8 +740,7 @@ public:
   SourceLocation getCXXOperatorNameBeginLoc() const {
     if (!CXXOperatorName.OInfo)
       return {};
-    return 
-        CXXOperatorName.OInfo->BeginOpNameLoc;
+    return CXXOperatorName.OInfo->BeginOpNameLoc;
   }
 
   /// Return the end location of the getCXXOperatorNameRange() range.
@@ -773,7 +772,7 @@ public:
     DNL.setNamedTypeLoc(TInfo);
     return DNL;
   }
-  
+
   /// Construct location information for a non-literal C++ operator.
   static DeclarationNameLoc
   makeCXXOperatorNameLoc(CXXOperatorSourceInfo *OInfo) {
@@ -838,7 +837,7 @@ public:
       return nullptr;
     return LocInfo.getNamedTypeInfo();
   }
-  
+
   /// setNamedTypeInfo - Sets the source type info associated to
   /// the name. Assumes it is a constructor, destructor or conversion.
   void setNamedTypeInfo(TypeSourceInfo *TInfo) {
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 6d1c2df6b..cf0e80900 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1195,7 +1195,9 @@ public:
     : Expr(OpaqueValueExprClass, Empty) {}
 
   /// Retrieve the location of this expression.
-  SourceLocation getLocation() const { return SourceLocation::getFromRawEncoding(OpaqueValueExprBits.Loc); }
+  SourceLocation getLocation() const {
+    return SourceLocation::getFromRawEncoding(OpaqueValueExprBits.Loc);
+  }
 
   SourceLocation getBeginLoc() const LLVM_READONLY {
     return SourceExpr ? SourceExpr->getBeginLoc() : getLocation();
@@ -1269,9 +1271,9 @@ class DeclRefExpr final
   friend class ASTStmtReader;
   friend class ASTStmtWriter;
   friend TrailingObjects;
-  
-      /// The location of the declaration name itself.
-      SourceLocation Loc;
+
+  /// The location of the declaration name itself.
+  SourceLocation Loc;
 
   /// The declaration that we are referencing.
   ValueDecl *D;
@@ -2006,8 +2008,8 @@ class PredefinedExpr final
       private llvm::TrailingObjects<PredefinedExpr, Stmt *> {
   friend class ASTStmtReader;
   friend TrailingObjects;
-   /// The location of this PredefinedExpr.
-   SourceLocation Loc;
+  /// The location of this PredefinedExpr.
+  SourceLocation Loc;
   // PredefinedExpr is optionally followed by a single trailing
   // "Stmt *" for the predefined identifier. It is present if and only if
   // hasFunctionName() is true and is always a "StringLiteral *".
@@ -3121,7 +3123,9 @@ public:
   /// Bluntly set a new number of arguments without doing any checks whatsoever.
   /// Only used during construction of a CallExpr in a few places in Sema.
   /// FIXME: Find a way to remove it.
-  void setNumArgsUnsafe(unsigned NewNumArgs) { CallExprBits.NumArgs = NewNumArgs; }
+  void setNumArgsUnsafe(unsigned NewNumArgs) {
+    CallExprBits.NumArgs = NewNumArgs;
+  }
 
   typedef ExprIterator arg_iterator;
   typedef ConstExprIterator const_arg_iterator;
@@ -3306,7 +3310,7 @@ class MemberExpr final
 
   /// MemberLoc - This is the location of the member name.
   SourceLocation MemberLoc;
-  
+
   SourceLocation OperatorLoc;
 
   size_t numTrailingObjects(OverloadToken<NestedNameSpecifierLoc>) const {
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index 98f40b8a8..18fc4dcef 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -3929,7 +3929,8 @@ public:
 
   /// Retrieve the location of the '->' or '.' operator.
   SourceLocation getOperatorLoc() const {
-    return SourceLocation::getFromRawEncoding(CXXDependentScopeMemberExprBits.OperatorLoc);
+    return SourceLocation::getFromRawEncoding(
+        CXXDependentScopeMemberExprBits.OperatorLoc);
   }
 
   /// Retrieve the nested-name-specifier that qualifies the member name.
@@ -4656,7 +4657,8 @@ public:
   }
 
   SourceLocation getNameLoc() const {
-    return SourceLocation::getFromRawEncoding(SubstNonTypeTemplateParmExprBits.NameLoc);
+    return SourceLocation::getFromRawEncoding(
+        SubstNonTypeTemplateParmExprBits.NameLoc);
   }
   SourceLocation getBeginLoc() const { return getNameLoc(); }
   SourceLocation getEndLoc() const { return getNameLoc(); }
diff --git a/clang/include/clang/AST/Stmt.h b/clang/include/clang/AST/Stmt.h
index 9a363f25f..1a04e2f5f 100644
--- a/clang/include/clang/AST/Stmt.h
+++ b/clang/include/clang/AST/Stmt.h
@@ -154,7 +154,7 @@ protected:
     /// floating-point features.
     LLVM_PREFERRED_TYPE(bool)
     unsigned HasFPFeatures : 1;
-  
+
     unsigned NumStmts;
   };
 
@@ -455,8 +455,6 @@ protected:
     unsigned NonOdrUseReason : 2;
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsImmediateEscalating : 1;
-
-
   };
 
 
@@ -526,8 +524,6 @@ protected:
     /// It is 0 otherwise.
     LLVM_PREFERRED_TYPE(bool)
     unsigned HasFPFeatures : 1;
-
-
   };
 
   class UnaryExprOrTypeTraitExprBitfields {
@@ -582,8 +578,8 @@ protected:
     /// Trailing objects. See the definition of CallExpr.
     LLVM_PREFERRED_TYPE(bool)
     unsigned HasTrailingSourceLoc : 1;
-    
-    unsigned NumArgs:20;
+
+    unsigned NumArgs : 20;
   };
   enum { NumCallExprBits = 52 };
 
@@ -629,7 +625,7 @@ protected:
     /// This is the location of the -> or . in the expression.
     // SourceLocation OperatorLoc;
   };
-  
+
   // 8 bytes
   class CastExprBitfields {
     friend class CastExpr;
@@ -1047,8 +1043,6 @@ protected:
     unsigned ConstructionKind : 3;
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsImmediateEscalating : 1;
-
-
   };
 
   class ExprWithCleanupsBitfields {
@@ -1097,7 +1091,7 @@ protected:
     /// the trailing objects.
     LLVM_PREFERRED_TYPE(bool)
     uint64_t HasFirstQualifierFoundInScope : 1;
-    
+
     /// The location of the '->' or '.' operator.
     LLVM_PREFERRED_TYPE(SourceLocation)
     uint64_t OperatorLoc : SourceLocation::Bits;
@@ -1175,7 +1169,7 @@ protected:
   class SubstNonTypeTemplateParmExprBitfields {
     friend class ASTStmtReader;
     friend class SubstNonTypeTemplateParmExpr;
-    
+
     LLVM_PREFERRED_TYPE(ExprBitfields)
     uint64_t : NumExprBits;
 
@@ -1313,8 +1307,8 @@ protected:
     /// bit is set to true.
     LLVM_PREFERRED_TYPE(bool)
     uint64_t IsUnique : 1;
-    
-     /// The location of the non-type template parameter reference.
+
+    /// The location of the non-type template parameter reference.
     LLVM_PREFERRED_TYPE(SourceLocation)
     uint64_t Loc : SourceLocation::Bits;
   };
@@ -1978,8 +1972,8 @@ class CaseStmt final
   //   with a range. Present if and only if caseStmtIsGNURange() is true.
   enum { LhsOffset = 0, SubStmtOffsetFromRhs = 1 };
   enum { NumMandatoryStmtPtr = 2 };
-      /// The location of the "case" or "default" keyword.
-      SourceLocation KeywordLoc;
+  /// The location of the "case" or "default" keyword.
+  SourceLocation KeywordLoc;
   unsigned numTrailingObjects(OverloadToken<Stmt *>) const {
     return NumMandatoryStmtPtr + caseStmtIsGNURange();
   }
@@ -2241,8 +2235,8 @@ class AttributedStmt final
       private llvm::TrailingObjects<AttributedStmt, const Attr *> {
   friend class ASTStmtReader;
   friend TrailingObjects;
-    /// The location of the attribute.
-    SourceLocation AttrLoc;
+  /// The location of the attribute.
+  SourceLocation AttrLoc;
   Stmt *SubStmt;
 
   AttributedStmt(SourceLocation Loc, ArrayRef<const Attr *> Attrs,
@@ -3116,7 +3110,6 @@ public:
 /// ContinueStmt - This represents a continue.
 class ContinueStmt : public Stmt {
 public:
-
   ContinueStmt(SourceLocation CL) : Stmt(ContinueStmtClass) {
     setContinueLoc(CL);
   }
diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index 91352a4aa..5f0a5db8b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -165,10 +165,12 @@ public:
 
   bool getRawEncoding32(uint32_t &Result) const {
     // A mask that isolates this check to the required range higher of bits.
-    static constexpr uint64_t RangeMask = llvm::maskTrailingOnes<uint64_t>(Bits - 32) << 31;
+    static constexpr uint64_t RangeMask =
+        llvm::maskTrailingOnes<uint64_t>(Bits - 32) << 31;
 
     // Check if the ID can be safely compressed to a 32-bit integer.
-    // The truncation is only possible if all higher bits of the ID are all identical:
+    // The truncation is only possible if all higher bits of the ID are all
+    // identical:
     //   all 0s for the local offset, or all 1s for loaded offset
     if ((ID ^ (ID << 1)) & RangeMask)
       return false; // won't fit
diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index 930e48d6c..1f9727411 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -721,7 +721,7 @@ class SourceManager : public RefCountedBase<SourceManager> {
   SmallVector<SrcMgr::SLocEntry, 0> LocalSLocEntryTable;
 
   SmallVector<SourceLocation::UIntTy, 0> LocalLocOffsetTable;
-  
+
   /// The table of SLocEntries that are loaded from other modules.
   ///
   /// Negative FileIDs are indexes into this table. To get from ID to an index,
@@ -1290,7 +1290,7 @@ public:
     if (!E)
       return std::make_pair(FileID(), 0);
 
-    auto Offset = Loc.getOffset()-E->getOffset();
+    auto Offset = Loc.getOffset() - E->getOffset();
     if (Loc.isFileID())
       return std::make_pair(FID, Offset);
 
@@ -1307,7 +1307,7 @@ public:
     if (!E)
       return std::make_pair(FileID(), 0);
 
-    auto Offset = Loc.getOffset()-E->getOffset();
+    auto Offset = Loc.getOffset() - E->getOffset();
     if (Loc.isFileID())
       return std::make_pair(FID, Offset);
     return getDecomposedSpellingLocSlowCase(E, Offset);
@@ -1981,8 +1981,9 @@ private:
 
   FileIDAndOffset
   getDecomposedExpansionLocSlowCase(const SrcMgr::SLocEntry *E) const;
-  FileIDAndOffset getDecomposedSpellingLocSlowCase(const SrcMgr::SLocEntry *E,
-                                                   SourceLocation::UIntTy Offset) const;
+  FileIDAndOffset
+  getDecomposedSpellingLocSlowCase(const SrcMgr::SLocEntry *E,
+                                   SourceLocation::UIntTy Offset) const;
   void computeMacroArgsCache(MacroArgsMap &MacroArgsCache, FileID FID) const;
   void associateFileChunkWithMacroArgExp(MacroArgsMap &MacroArgsCache,
                                          FileID FID,
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 689def9c6..df2279a4f 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -7104,7 +7104,8 @@ ASTContext::getNameForTemplate(TemplateName Name,
     } else {
       DName = DeclarationNames.getCXXOperatorName(TN.getOperator());
       // DNInfo work in progress: FIXME: source locations?
-      DeclarationNameLoc DNLoc = DeclarationNameLoc::makeCXXOperatorNameLoc(nullptr);
+      DeclarationNameLoc DNLoc =
+          DeclarationNameLoc::makeCXXOperatorNameLoc(nullptr);
       return DeclarationNameInfo(DName, NameLoc, DNLoc);
     }
   }
diff --git a/clang/lib/Basic/Diagnostic.cpp b/clang/lib/Basic/Diagnostic.cpp
index 9e176c1bc..b2cd532d1 100644
--- a/clang/lib/Basic/Diagnostic.cpp
+++ b/clang/lib/Basic/Diagnostic.cpp
@@ -204,8 +204,8 @@ DiagnosticsEngine::DiagStateMap::lookup(SourceManager &SrcMgr,
   return F->lookup(Decomp.second);
 }
 
-DiagnosticsEngine::DiagState *
-DiagnosticsEngine::DiagStateMap::File::lookup(SourceLocation::UIntTy Offset) const {
+DiagnosticsEngine::DiagState *DiagnosticsEngine::DiagStateMap::File::lookup(
+    SourceLocation::UIntTy Offset) const {
   auto OnePastIt =
       llvm::partition_point(StateTransitions, [=](const DiagStatePoint &P) {
         return P.Offset <= Offset;
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index b7cbcd54b..37036302c 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -943,9 +943,8 @@ FileIDAndOffset SourceManager::getDecomposedExpansionLocSlowCase(
   return std::make_pair(FID, Offset);
 }
 
-FileIDAndOffset
-SourceManager::getDecomposedSpellingLocSlowCase(const SrcMgr::SLocEntry *E,
-                                                SourceLocation::UIntTy Offset) const {
+FileIDAndOffset SourceManager::getDecomposedSpellingLocSlowCase(
+    const SrcMgr::SLocEntry *E, SourceLocation::UIntTy Offset) const {
   // If this is an expansion record, walk through all the expansion points.
   FileID FID;
   SourceLocation Loc;
diff --git a/clang/lib/Rewrite/Rewriter.cpp b/clang/lib/Rewrite/Rewriter.cpp
index e4dca072b..28d9092e1 100644
--- a/clang/lib/Rewrite/Rewriter.cpp
+++ b/clang/lib/Rewrite/Rewriter.cpp
@@ -131,7 +131,7 @@ std::string Rewriter::getRewrittenText(CharSourceRange Range) const {
 }
 
 SourceLocation::UIntTy Rewriter::getLocationOffsetAndFileID(SourceLocation Loc,
-                                              FileID &FID) const {
+                                                            FileID &FID) const {
   assert(Loc.isValid() && "Invalid location");
   FileIDAndOffset V = SourceMgr->getDecomposedLoc(Loc);
   FID = V.first;
diff --git a/clang/lib/Sema/SemaLambda.cpp b/clang/lib/Sema/SemaLambda.cpp
index e172a099f..a47db06c3 100644
--- a/clang/lib/Sema/SemaLambda.cpp
+++ b/clang/lib/Sema/SemaLambda.cpp
@@ -1003,9 +1003,8 @@ CXXMethodDecl *Sema::CreateLambdaCallOperator(SourceRange IntroducerRange,
   // and trailing-return-type respectively.
   DeclarationName MethodName =
       Context.DeclarationNames.getCXXOperatorName(OO_Call);
-  DeclarationNameLoc MethodNameLoc =
-      DeclarationNameLoc::makeCXXOperatorNameLoc(
-        Context.getCXXOperatorSourceInfo(IntroducerRange.getBegin()));
+  DeclarationNameLoc MethodNameLoc = DeclarationNameLoc::makeCXXOperatorNameLoc(
+      Context.getCXXOperatorSourceInfo(IntroducerRange.getBegin()));
   CXXMethodDecl *Method = CXXMethodDecl::Create(
       Context, Class, SourceLocation(),
       DeclarationNameInfo(MethodName, IntroducerRange.getBegin(),
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index f75e7df1a..69e219d53 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -9793,7 +9793,7 @@ ASTRecordReader::readDeclarationNameLoc(DeclarationName Name) {
 
   case DeclarationName::CXXOperatorName:
     return DeclarationNameLoc::makeCXXOperatorNameLoc(
-      getASTContext().getCXXOperatorSourceInfo(readSourceRange()));
+        getASTContext().getCXXOperatorSourceInfo(readSourceRange()));
 
   case DeclarationName::CXXLiteralOperatorName:
     return DeclarationNameLoc::makeCXXLiteralOperatorNameLoc(
diff --git a/clang/lib/Serialization/ASTReaderStmt.cpp b/clang/lib/Serialization/ASTReaderStmt.cpp
index 51115ac37..b8a12eb7a 100644
--- a/clang/lib/Serialization/ASTReaderStmt.cpp
+++ b/clang/lib/Serialization/ASTReaderStmt.cpp
@@ -2043,7 +2043,8 @@ void ASTStmtReader::VisitCXXDependentScopeMemberExpr(
   else
     E->Base = nullptr;
 
-  E->CXXDependentScopeMemberExprBits.OperatorLoc = readSourceLocation().getRawEncoding();
+  E->CXXDependentScopeMemberExprBits.OperatorLoc =
+      readSourceLocation().getRawEncoding();
 
   if (HasFirstQualifierFoundInScope)
     *E->getTrailingObjects<NamedDecl *>() = readDeclAs<NamedDecl>();
@@ -2230,7 +2231,8 @@ void ASTStmtReader::VisitSubstNonTypeTemplateParmExpr(
   E->Index = CurrentUnpackingBits->getNextBits(/*Width=*/12);
   E->PackIndex = Record.readUnsignedOrNone().toInternalRepresentation();
   E->Final = CurrentUnpackingBits->getNextBit();
-  E->SubstNonTypeTemplateParmExprBits.NameLoc = readSourceLocation().getRawEncoding();
+  E->SubstNonTypeTemplateParmExprBits.NameLoc =
+      readSourceLocation().getRawEncoding();
   E->Replacement = Record.readSubExpr();
 }
 
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 0554e155e..e1a5d2690 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -175,10 +175,9 @@ CXSourceRange cxloc::translateSourceRange(const SourceManager &SM,
 }
 
 CharSourceRange cxloc::translateCXRangeToCharRange(CXSourceRange R) {
-   if (!R.ptr_data[0])
-     return CharSourceRange();
-   const SourceManager &SM =
-      *static_cast<const SourceManager *>(R.ptr_data[0]);
+  if (!R.ptr_data[0])
+    return CharSourceRange();
+  const SourceManager &SM = *static_cast<const SourceManager *>(R.ptr_data[0]);
   return CharSourceRange::getCharRange(
       SourceLocation::getFromRawEncoding32(SM, R.begin_int_data),
       SourceLocation::getFromRawEncoding32(SM, R.end_int_data));
@@ -7643,9 +7642,9 @@ CXString clang_getTokenSpelling(CXTranslationUnit TU, CXToken CXTok) {
   if (!CXXUnit)
     return cxstring::createEmpty();
 
-  SourceLocation Loc = SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(), CXTok.int_data[1]);
-  auto LocInfo =
-      CXXUnit->getSourceManager().getDecomposedSpellingLoc(Loc);
+  SourceLocation Loc = SourceLocation::getFromRawEncoding32(
+      CXXUnit->getSourceManager(), CXTok.int_data[1]);
+  auto LocInfo = CXXUnit->getSourceManager().getDecomposedSpellingLoc(Loc);
   bool Invalid = false;
   StringRef Buffer =
       CXXUnit->getSourceManager().getBufferData(LocInfo.first, &Invalid);
@@ -7667,7 +7666,8 @@ CXSourceLocation clang_getTokenLocation(CXTranslationUnit TU, CXToken CXTok) {
 
   return cxloc::translateSourceLocation(
       CXXUnit->getASTContext(),
-      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(), CXTok.int_data[1]));
+      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(),
+                                           CXTok.int_data[1]));
 }
 
 CXSourceRange clang_getTokenExtent(CXTranslationUnit TU, CXToken CXTok) {
@@ -7682,7 +7682,8 @@ CXSourceRange clang_getTokenExtent(CXTranslationUnit TU, CXToken CXTok) {
 
   return cxloc::translateSourceRange(
       CXXUnit->getASTContext(),
-      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(), CXTok.int_data[1]));
+      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(),
+                                           CXTok.int_data[1]));
 }
 
 static bool getTokens(ASTUnit *CXXUnit, SourceRange Range,
@@ -7727,7 +7728,7 @@ static bool getTokens(ASTUnit *CXXUnit, SourceRange Range,
     if (!Tok.getLocation().getRawEncoding32(TokLocRaw))
       return false; // location is too big for libclang ABI
     CXTok.int_data[1] = TokLocRaw;
-    
+
     CXTok.int_data[2] = Tok.getLength();
     CXTok.int_data[3] = 0;
 
diff --git a/clang/tools/libclang/CXSourceLocation.cpp b/clang/tools/libclang/CXSourceLocation.cpp
index 6ea42f795..835efc914 100644
--- a/clang/tools/libclang/CXSourceLocation.cpp
+++ b/clang/tools/libclang/CXSourceLocation.cpp
@@ -216,7 +216,7 @@ int clang_Location_isInSystemHeader(CXSourceLocation location) {
   const SourceManager &SM =
       *static_cast<const SourceManager *>(location.ptr_data[0]);
   const SourceLocation Loc =
-    SourceLocation::getFromRawEncoding32(SM, location.int_data);
+      SourceLocation::getFromRawEncoding32(SM, location.int_data);
   if (Loc.isInvalid())
     return 0;
 
diff --git a/clang/tools/libclang/CXSourceLocation.h b/clang/tools/libclang/CXSourceLocation.h
index bc36db357..d514b0448 100644
--- a/clang/tools/libclang/CXSourceLocation.h
+++ b/clang/tools/libclang/CXSourceLocation.h
@@ -69,11 +69,10 @@ static inline CXSourceRange translateSourceRange(ASTContext &Context,
 }
 
 static inline SourceLocation translateSourceLocation(CXSourceLocation L) {
-   if (!L.ptr_data[0]) {
+  if (!L.ptr_data[0]) {
     return SourceLocation();
   }
-  const SourceManager &SM =
-      *static_cast<const SourceManager *>(L.ptr_data[0]);
+  const SourceManager &SM = *static_cast<const SourceManager *>(L.ptr_data[0]);
   return SourceLocation::getFromRawEncoding32(SM, L.int_data);
 }
 
diff --git a/clang/tools/libclang/Indexing.cpp b/clang/tools/libclang/Indexing.cpp
index da3f3f973..b62204664 100644
--- a/clang/tools/libclang/Indexing.cpp
+++ b/clang/tools/libclang/Indexing.cpp
@@ -995,7 +995,7 @@ void clang_indexLoc_getFileLocation(CXIdxLoc location,
 
 CXSourceLocation clang_indexLoc_getCXSourceLocation(CXIdxLoc location) {
   if (!location.ptr_data[0])
-  return clang_getNullLocation();
+    return clang_getNullLocation();
 
   CXIndexDataConsumer &DataConsumer =
       *static_cast<CXIndexDataConsumer *>(location.ptr_data[0]);
diff --git a/clang/unittests/Serialization/SourceLocationEncodingTest.cpp b/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
index 63aec9a94..0c36feef6 100644
--- a/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
+++ b/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
@@ -32,10 +32,9 @@ void roundTrip(SourceLocation::UIntTy Loc,
   SourceLocation::UIntTy DecodedEncoded =
       SourceLocationEncoding::decode(ActualEncoded).first.getRawEncoding();
   ASSERT_EQ(DecodedEncoded, Loc) << "Decoding " << ActualEncoded;
-} 
+}
 
-constexpr SourceLocation::UIntTy MacroBit =
-    1ull << (SourceLocation::Bits - 1);
+constexpr SourceLocation::UIntTy MacroBit = 1ull << (SourceLocation::Bits - 1);
 constexpr SourceLocation::UIntTy Big = 1ull << (SourceLocation::Bits - 2);
 constexpr SourceLocation::UIntTy Biggest =
     llvm::maskTrailingOnes<uint64_t>(SourceLocation::Bits - 1);

@hokein
Copy link
Collaborator Author

hokein commented Jun 30, 2025

Although this is still a draft, it should be ready for an initial round of review. Any feedback or suggestions would be greatly appreciated.

@AaronBallman @cor3ntin @erichkeane @ilya-biryukov

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a light pass of the 1st 1/4 of this or so. The smuggling back and forth to raw-encoding seems strange? WHy are we doing that here?

Otherwise just minor questions/requested analysis.

FWIW, I'm reasonably comfortable with just pulling this band-aid, but the Area Team probably needs to make the determination (and we need to spend more time on this review).

@@ -682,6 +682,11 @@ class DeclarationNameTable {
DeclarationName getCXXLiteralOperatorName(const IdentifierInfo *II);
};

struct CXXOperatorSourceInfo {
SourceLocation::UIntTy BeginOpNameLoc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have sworn I just saw this structure created/modified in another patch about a week ago? Does this need a rebase?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting -- this patch was rebased recently on top of 0b6ddb0 (from 4 days ago), and I don't recall any issues during the rebase.

@@ -698,8 +703,7 @@ class DeclarationNameLoc {

// The location (if any) of the operator keyword is stored elsewhere.
struct CXXOpName {
SourceLocation::UIntTy BeginOpNameLoc;
SourceLocation::UIntTy EndOpNameLoc;
CXXOperatorSourceInfo* OInfo;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this indirection save perf wise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the performance impact on this indirection -- this is for the C++ operation call, which is rare in practice (the perf here https://llvm-compile-time-tracker.com/compare.php?from=ff5a67315305f59f91041bad8b905e161b872442&to=38c519de69b127faa823078d3faff5670bc60209&stat=instructions:u)

@@ -682,6 +682,11 @@ class DeclarationNameTable {
DeclarationName getCXXLiteralOperatorName(const IdentifierInfo *II);
};

struct CXXOperatorSourceInfo {
SourceLocation::UIntTy BeginOpNameLoc;
SourceLocation::UIntTy EndOpNameLoc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this is ::UIntTy instead of just the source location? I realize it was before, but now that we're indirecting the object, I wonder if the original reason still exists.

/// Construct location information for a non-literal C++ operator.
static DeclarationNameLoc makeCXXOperatorNameLoc(SourceRange Range) {
static DeclarationNameLoc
makeCXXOperatorNameLoc(CXXOperatorSourceInfo *OInfo) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird function to exist... am I correct in that it is only called 1x? I find myself thinkin perhaps we should just inline it manually.

return SourceLocation::getFromRawEncoding(CXXBoolLiteralExprBits.Loc);
}
void setLocation(SourceLocation L) {
CXXBoolLiteralExprBits.Loc = L.getRawEncoding();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself curious why we go back and forth to raw-encoding? I thought that is supposed to only happen when we are in a situation where SourceLocation isn't really visible, but it should be here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, agree. I don't see a particular reason using the raw encoding here, I think we can use SourceLocation, will send out a separate cleanup patch.

@hokein
Copy link
Collaborator Author

hokein commented Jun 30, 2025

I did a light pass of the 1st 1/4 of this or so. The smuggling back and forth to raw-encoding seems strange? WHy are we doing that here?

In this patch, to keep AST node size as small as possible, we're using 40 bits of StmtBits to store the SourceLocation, we need this back-and-forth conversion.

For the DeclarationName, you're right — we can just store the SourceLocation directly and avoid the raw-encoding round-trip.

@erichkeane
Copy link
Collaborator

I did a light pass of the 1st 1/4 of this or so. The smuggling back and forth to raw-encoding seems strange? WHy are we doing that here?

In this patch, to keep AST node size as small as possible, we're using 40 bits of StmtBits to store the SourceLocation, we need this back-and-forth conversion.

For the DeclarationName, you're right — we can just store the SourceLocation directly and avoid the raw-encoding round-trip.

Sorry for being dumb... how are we getting away with only storing 40 bits? Does that not just artificially limit our source-location size for statements? And, frankly, then our entire space? I guess 8 2x-increases in source-location size are nice, but I was sort of hoping the switch to 64 bits meant we never had to discuss this again for the rest of my life:D

@hokein
Copy link
Collaborator Author

hokein commented Jun 30, 2025

Sorry for being dumb... how are we getting away with only storing 40 bits? Does that not just artificially limit our source-location size for statements? And, frankly, then our entire space? I guess 8 2x-increases in source-location size are nice, but I was sort of hoping the switch to 64 bits meant we never had to discuss this again for the rest of my life:D

Yeah, kind of. We can actually use up to 48 bits for SourceLocation, based on how we encode it for modules-- see the PR description for details.

The entire space will be 2^(SourceLocation::Bits-1). In the current implementation, adjusting the number of bits used is straightforward — it just changes SourceLocation::Bits. So I figured we could start with a “small” number (40 bits in this case, so that most Stmt's Loc can stay inside the StmtBits) and see if that’s sufficient in practice. If we ever need more, increasing it is relatively easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants