Skip to content
This repository has been archived by the owner on Jul 14, 2018. It is now read-only.

Rust should integrate easily into large build systems #12

Open
aturon opened this issue Jan 31, 2017 · 87 comments
Open

Rust should integrate easily into large build systems #12

aturon opened this issue Jan 31, 2017 · 87 comments
Assignees
Labels

Comments

@aturon
Copy link
Member

aturon commented Jan 31, 2017

Overview

When working with larger organizations interested in using Rust, one of the first hurdles we tend to run into is fitting into an existing build system. We've been exploring a number of different approaches, each of which ends up using Cargo (and sometimes rustc) in different ways, with different stories about how to incorporate crates from the broader crates.io ecosystem. Part of the issue seems to be a perceived overlap between functionality in Cargo (and its notion of compilation unit) and in ambient build systems, but we have yet to truly get to the bottom of the issues—and it may be that the problem is one of communication, rather than of some technical gap.

By the end of 2017, this kind of integration should be easy: as a community, we should have a strong understanding of best practices, and potentially build tooling in support of those practices. And of course, we want to approach this goal with Rust's values in mind, ensuring that first-class access to the crates.io ecosystem is a cornerstone of our eventual story.

Projects

At this point, we are still trying to assess the problems people face in this area. If you have experience here, please leave a comment with your thoughts!

@aturon aturon added the Vision label Jan 31, 2017
@luser
Copy link

luser commented Jan 31, 2017

I started a repo not long ago to collect anecdotes about people integrating Rust into existing projects. I only have a few examples in there (and I keep seeing more crop up all the time). It'd be great to at least do a survey of all the examples we can find of people attacking this problem to see what the major issues were.

From the Firefox perspective, we have a pretty custom build frontend that generates Makefiles in the backend, so we would have had to write the custom integration bits regardless. We did find that life got a lot better once we started invoking cargo instead of rustc directly. I'm sure there are projects that would get value out of having individual Rust source files in their codebase, but it feels like a lot of the value in the Rust ecosystem comes from using cargo and leveraging crates.io (I doubt this is contentious).

@goertzenator
Copy link

I've worked on build systems to compile Rust components for Erlang applications, so I'm in a good position talk about a few issues. An overview of the building that takes place can be found in this README. Erlang can make use of bins, dylibs, and cdylibs.

  1. dylibs on OSX have special link requirements ("-- --codegen link-args='-flat_namespace -undefined suppress'") which creates a cascade of fussy work. Firstly, to provide that flag I need to "cargo rustc" instead of "cargo build", and to do that I need to detect all the binary/lib targets and build them one at a time. I really wish I could just "cargo build" and have cargo sort out the details for me. Maybe a "--extension-lib" flag for "cargo build" to apply this behavior? I understand this linking scenario is not unique to Erlang.

  2. Discovering and locating the output files is tricky. I have to "cargo read-manifest" to find the targets, form a platform-specific name from those results, then parse the flags for any "--target" flags to form a path for these files. I would love a flag of the form "--print-artifacts=[bin|lib|dylib|cdylib]" for "cargo build" to print the full output path and name to stdout.

@bobsomers
Copy link

This is great. Integrating Rust/Cargo with Bazel would be the first major hurdle to us using more Rust in our codebase at work, which is a pretty large mass of C++ code with some Python sprinkled throughout.

Since both Bazel and Cargo are fairly opinionated about how builds and package management should work, and since I am an expert in neither, it's not immediately clear to me which build system should be doing what or if we should just try to integrate rustc into Bazel without Cargo at all. Using strictly Cargo is (unfortunately) probably out of the question since most of the C++ and Python packages are dependency tracked with Bazel BUILD files, and any serious integration with the C++ code would require our Rust libraries and binaries to be dependency tracked by Bazel as well.

@steveklabnik
Copy link
Member

steveklabnik commented Feb 6, 2017 via email

@mfarrugi
Copy link

mfarrugi commented Feb 7, 2017

Bazel has support for local sources, which is to say it does not support crates.io. There's an issue open for it, but the project is relatively inactive bazelbuild/rules_rust#2. One would need to keep a local repository clone of a crate and all its dependencies to make bazel happy without modifications.

Bazel's rust targets also can't be depended on by c/c++ at the moment.

The kythe project has wrappers for bazel to call cargo, but I it's not the most robust approach to integration. For reference, kythe:tools/build_rules/rust.

disclaimer: I don't know too much about bazel, happened to learn the above looking into it last week.

@withoutboats
Copy link

Bazel's been mentioned several times this has come up. I believe Dropbox needs to integrate its Rust into a Bazel build process, and I think I might have been told Facebook uses it as well (but it might also have been that Facebook has an internal tool is similar to Bazel). It seems like a promising tool to look into for this issue.

@davidzchen
Copy link

Bazel Rust Rules author here. Apologies for the inactivity on the rules_rust project; I have been busy with other projects on my plate. I am planning to implement the workspace rules for pulling crates from Cargo (bazelbuild/rules_rust#2) by the end of Q1 with a stretch goal of also implementing tooling for automatically generating Bazel BUILD files from Cargo.toml (bazelbuild/rules_rust#3), though the latter will likely extend into Q2.

The Kythe project has rules that shell out to Cargo directly as quick stop-gap measure, but those rules are meant to be used internally in Kythe for now (since they are not hermetic for instance) and the plan is to replace them with the rules in rules_rust once features such as pulling from Cargo are supported.

Of course, if anyone is interested in helping with improving the Bazel rules for Rust, contributions are certainly welcome. :)

+cc @damienmg

@shahms
Copy link

shahms commented Feb 7, 2017

Kythe contributor here. We'd definitely like to see better integration with Cargo from the upstream Bazel Rust rules. Our extant integration was very much a hack to allow our intern to make progress on the Rust indexer itself, rather than getting bogged down with Bazel integration.

@LegNeato
Copy link

LegNeato commented Feb 7, 2017

Facebook uses Buck, and there is some early Rust support:

https://buckbuild.com/rule/rust_binary.html

Facebook vendors their dependencies in-tree.

@jsgf
Copy link

jsgf commented Feb 7, 2017

Facebook uses Buck, and there is some early rust support:

I've spent quite a bit of time on that over the last few months, and it's getting pretty solid now. It's well integrated with the overall build/test system and (most recently) can also interop with cxx rules.

@jpakkane
Copy link

jpakkane commented Feb 7, 2017

I'm the main author of Meson build system that is being used by GStreamer and a bunch of other projects. We also have Rust support, which is a bit rudimentary but can be used to build stuff like a Python extension module that uses C, C++, Rust and Fortran in a single target. We aim to improve Rust support. This is especially important for mixed-language projects, since Cargo is nice for plain Rust projects but I'm fairly certain that Cargo developers do not want to add first class multiplatform C/C++ build support to Cargo.

@larsbergstrom
Copy link

A key part of doing this well will involve handling updates to the dependency graph in a deliberate and piecemeal way, particularly in scenarios where upstream "master" has moved to a new version (e.g., as in rust-lang/cargo#2649).

We (@nox, @SimonSapin, and I) have had many conversations with @alexcrichton and @wycats on this front, and I believe the leading contender from their point of view is for some extensions to fix up paths and avoid the abuse of [replace], as it runs immediately into version-related roadblocks for any non-trivial project.

@moretea
Copy link

moretea commented Feb 7, 2017

Nix (which has a similar pure view of build systems as Bazel) uses a trick that involves cloning a well known version of the crates index git repository, see https://github.com/NixOS/nixpkgs/tree/master/pkgs/build-support/rust

@sjmackenzie
Copy link

Sadly the nixpkgs crates index is not hermetically sealed, it gave us many problems. So we implemented a crate index nixifier, which reads crates.io-index and spits out a nixfied crates index. This allows one to use nix to completely manage transitive crate dependencies of a project without needing cargo. The repo: https://github.com/fractalide/nixcrates

@acmcarther
Copy link

I've taken a look at teaching bazel how to digest Cargo's toml files to pull down third party crates.io dependencies automatically, and there don't seem to be very many sticking points.

A couple of rough patches I've seen though:

  • Cargo exports some environment variables that projects can sometimes come to depend on. This happend once with clap. This is super minor though.
  • target.*.dependencies rules can be hard to deal with since they (seem?) to use rust inlined into the toml. My current approach is to just include them all, and hack around any platform specific deps that don't play nice on my platform.
  • build.rs files seem antithetical to bazel's "hermetic ethos", since they can do pretty much anything. I think this will become less of an issue in the very near future since common usecases such as serde are being resolved with the stabilization of macros 1.1.

@tupshin
Copy link

tupshin commented Feb 7, 2017

Just commenting here because I believe there is a lot of opportunity to appeal to the JVM eco-system, and incrementally replacing java/scala/etc code with rust if we are able to make it trivial to incorporate rust into ant/maven/etc builds, as well as go the other way and be able to add a jar to a Cargo.toml or build.rs, and have automatic rust bindings generated, maybe using a combination of https://github.com/kud1ing/rucaja/ and https://github.com/kenpratt/jvm-assembler as starting points. This could all happen outside core, obviously, but rust adoption by the very large, and as-of-yet untapped-by-rust, ecosystem of jvm code and developers would be quite beneficial.

@Ericson2314
Copy link

I've proposed haskell/cabal#3882 for Cabal. The same thing can work for Cargo, and would solve this problem for everyone.

@luser
Copy link

luser commented Feb 7, 2017

@raphlinus and I were discussing some related issues on IRC not long ago. One of the ideas that he floated was a way to make cargo simply output the commands that it would run to do the build, so that we could leave parsing Cargo.toml to cargo, but allow other build systems like bazel to run the build like they expect.

@alexcrichton
Copy link
Member

Thanks for the comments everyone! I and a few others have thought a lot about this in the past as well, and I wanted to jot down some notes and conclusions that we've reached historically.

First and foremost we've historically concluded that build system integration is not finished until you've got access to crates.io crates. The standard library is purposefully conservative and small in size with the explicit intent of having rich functionality in the ecosystem on crates.io. If an integration doesn't allow easy access to crates.io, then there's more work yet to be done!

Today, of course, Cargo is the primary gateway into the crates.io ecosystem. Cargo is also the primary build tool for Rust, but there's typically perennial questions about how to integrate Cargo into existing build tools. Many issues have been solved over time in this vein, such as vendoring dependencies, workspaces, etc. Cargo also has the benefit of being friendly and familiar to existing Rust programmers with a shared workflow across the ecosystem.

Something else that we've concluded, however, is that preserving Cargo workflows should not necessarily be a hard constraint for build system integration. Existing projects already have a workflow associated with them, and Rust code should integrate as it is fit instead of imposing restrictions on how it works. Of course though preserving a Cargo-based workflow for the Rust-specific portions is a nice-to-have!

And finally one last thing we've talked about is compilation units. For example C/C++ have files as compilation units and that's typically what build systems for C/C++ are normally architected around. Rust, however, doesn't have this granularity of compilation unit. Fundamentally the compiler supports crates as the compilation unit. Moving up the stack to Cargo it ends up generally being the case that the entire crate graph is Cargo's compilation unit (one command outputs the entire crate graph). The question is then how does this integrate into an existing build system? Is the crate graph sufficient? Does the granularity need to be finer, such as crates? (I'm not sure!)

One part to consider about compilation units is that they typically heavily affect caching in build systems. For example distributed caches may cache entire compilation units but nothing more granular. This means that a DAG-as-a-unit would be probably too coarse for caching. On the other hand, though, crate-as-a-unit is not clear how to integrate between build systems and Cargo today.

So with all that I think we're faced with a few problems that may be thorny to solve:

  • If we go with DAG-as-a-unit, is this sufficient? Can Cargo hook into existing caching infrastructure adequately? I believe this is how projects like Gecko work today where the whole Rust DAG is a unit and cargo is used to build it. This may have problems, however, if there are multiple Rust projects to link together (e.g. stylo and spidermonkey in Gecko both independently having Rust code)

  • If we go with crate-as-a-unit, how can we get Cargo and build systems to cooperate? Does Cargo need finer-grained operation? Should Cargo generate build files for each target build system? My assumption is that supporting features like build scripts may be very difficult in foreign build systems, but can we get by without build scripts in some systems?

Unfortunately I don't have a whole lot of solutions just yet, I'm personally still at least trying to grapple with the problem space. @luser does the above sound accurate though for Gecko's Rust integrate at least on a high level? @jsgf could you detail some of the work you've done at a high level for Buck for Rust support?

I've found that each build system tends to have its own unique set of constraints for integration, but the more we know the easier we can accommodate everyone!

@alexcrichton
Copy link
Member

Oh one point I should also mention is that I personally think that it's at least relatively important to try to lean on Cargo as much as possible with build system integration. Cargo is the bread and butter of building Rust, and avoiding Cargo leaves build systems as massive number of features to reimplement. I'd much rather pursue avenues to add features and/or make Cargo more flexible to interoperate with existing build systems. For example I could imagine Cargo generating build files or working in a much more granular fashion assuming another process manages inputs/outputs.

@jsgf
Copy link

jsgf commented Feb 8, 2017

The things that make Cargo awkward to use in our environment are:

  • It downloads things at build time; we need to be able to nail down external dependencies and be explicit when they're updated.
  • It's awkward to have dependencies on C++ code. We have tons of existing C++ code, and abandoning it is a non-option. Restricting Rust to pure code would make it unsuitable for many potential users - as a result, I've been spending some time to make integration with C++ as straightforward as possible, which also means being able to take C++ as a dependency using its native build system
  • Cargo does too much. We have an extensive distributed caching system for build artifacts which tries to use the cache to avoid as much build work as possible. If we were to use Cargo to build it would be all or nothing - either it would not be invoked if nothing needs to be rebuilt, or run to build everything once
  • Cargo doesn't do enough. Each Cargo.toml has its own set of dependencies, and those dependent crates get rebuilt for each Cargo.toml that depends on them. Using workspaces allows those deps to be shared, but only with a very specific arrangement of directories. It doesn't scale to a large single source base with containing thousands of distinct projects, some of which may be rust, with an organizational scheme that's something other that their shared dependencies.
  • build.rs doesn't really work - building an executable then running it in the build infra is pretty awkward, and not well regarded. Making it work well in a semi-cross build environment is also tricky (not cross-arch, but different library versions in the build execution env vs the prod env).

What I think you're saying is that going directly to rustc is too low-level for your taste; you'd prefer to have a higher-level tool that's actually coordinating builds. But on the other hand, cargo is too high-level for our purposes. As a standalone tool I think its excellent, but it tries to impose too many opinions to interact well with other build environments.

Perhaps there's some scope for a mid-level tool that provides cargo's mechanisms, but not the UI, and a higher-level tool that presents a nicer UI/user experience when its used as the primary build mechanism. Or perhaps rustc itself is that interface, and it just needs to be designed accordingly?

Right now I'm handling all this by using cargo to download crates.io crates and manage all their dependencies, then prebuilding them and keeping all the build artifacts. All our internal builds are built with buck using its dependency management, and ultimately linked with the prebuilt crates.io code. That way cargo is a one-time operation rather than something that's involved with every build, while still taking advantage of it to manage all the code that's intended to be built with it.

@Ericson2314
Copy link

Ericson2314 commented Feb 8, 2017

@alexcrichton DAG-as-a-unit will always be too course grained. That is basically what we have now, and it is not good enough. For crate-as-a-unit though, I'd think build.rs would not be a problem because any dynamism from build.rs only affects the current crate, right? The dependency graph with crate-granularity is still static.

So yeah, what needs to be done at a minimum is making two new mode of operation for Cargo:

  1. Make a complete plan (DAG with crate nodes); this is way way more than a lockfile. Do impure things here like download crates.io indices and crates.io pkg sources here too.

  2. Build one crate/node from the pre-made plan. Assume every node gets its own $out directory, and paths to all (transitive) dependencies $out directories will by passed to Cargo in this mode.


A cool follow-up to this would be writing a dependency-management library that both allows serializing plans such that external tools can drive the build, or executing the plan in the current process. This would avoid any code duplication in Cargo and, I'd guess, be useful for rustbuild, too.

See haskell/cabal#4174 for analogous effort with Cabal and Shake (though, unfortunately, Shake does not allow exporting static dependency graphs like this).

@davidzchen
Copy link

I also echo @jsgf's point about build.rs not being ideal. build.rs is used by Rust projects to compile code in other languages, such as C++ libraries using gcc-rs, but this should really be handled by the build system itself.

For example, the Bazel rust rules supports interop with C/C++ code, meaning that you can have the following:

cc_library(
    name = "foo",
    srcs = ["foo.cc"],
    hdrs = ["foo.h"],
)

rust_library(
    name = "bar",
    srcs = ["src/lib.rs"],
    deps = [":foo"],
)

Also, of note, while one mode of using Bazel is to vendor all dependencies (which is the practice followed by Google internally), Bazel also supports fetching external dependencies (which are all done prior to build time) and provides an simple API for writing repository rules. As mentioned above, a rule that fetches crates from Crates.io is in the pipeline for users who do not prefer to vendor all dependencies.

@alexcrichton
Copy link
Member

Thanks for typing that up @jsgf!

It downloads things at build time

To clarify, I'm under the impression that this is a solve problem today. With multiple vendoring options available that was at least the intention! Did you find, though, that the vendoring support wasn't suitable for Buck's use case?

It's awkward to have dependencies on C++ code.

To clarify, this is from a build system perspective, not a language perspective, right? If possible I'd like to focus this thread at least on just the build system aspect and we can perhaps continue the language discussion over at #14 :)

It definitely makes sense to me that it's difficult to depend on C++ code in a build-system sense. Some of this I think is the granularity builds today (DAG vs crate) but in general I think it's just flat out unergonomic and difficult to plug in preexisting artifacts into a Cargo build.

We have an extensive distributed caching system for build artifacts which tries to use the cache to avoid as much build work as possible.

Definitely makes sense! I don't think it's out of the question though for Cargo to support custom caching though. In fact, with sccache we may get exactly this!

In general I'd like to keep an open mind to Cargo's current implementation today, and we can basically extend it in any way we see fit. For example Cargo's already got enough information to create a unique hash key for a crate and we could restructure it with custom caching to pull in artifacts on demand (or assume they're at a predetermined location) or something like that.

Not saying this is a silver bullet of course, but our options are still open!

Using workspaces allows those deps to be shared, but only with a very specific arrangement of directories. It doesn't scale to a large single source base with containing thousands of distinct projects, some of which may be rust, with an organizational scheme that's something other that their shared dependencies.

I'm not sure I quite understand this constraint, so I wonder if we could dig in a bit? I definitely agree that a workspace may not scale to thousands of crates and projects, but the idea of a Cargo.toml certainly should, right?

I guess I'm not fully understanding what's not scaling here. Are you thinking this is a fundamental compiler limitation? Or just something that needs working around in Cargo today? As with above, I'd like to keep in mind the possibility of changes to Cargo to make it more amenable to situations like this rather than assuming the functionality of today is impossible to change!

build.rs doesn't really work - building an executable then running it in the build infra is pretty awkward, and not well regarded

Yeah I can definitely understand how this may be nonstandard. I don't think this is something that can be sidestepped for too long, though, as a concept. Custom derive (macros 1.1) was stable in Rust 1.15, and that requires compiling a plugin a build-time to then run inside the compiler. I would draw such a practice as very similar to build.rs (in principle at least), and I'd expect that ergonomically using Rust will basically require using Serde in the near future (especially for communicating services).

In that sense, is it literally the build.rs with inputs/outputs itself? Or is it the concept of running code at build time that may cause problems? I'd definitely argue that macros 1.1 is more prinicpled than build.rs (defined set of inputs/outputs) but they naively to me at least don't seem fundamentally different.

Perhaps there's some scope for a mid-level tool that provides cargo's mechanisms, but not the UI, and a higher-level tool that presents a nicer UI/user experience when its used as the primary build mechanism.

I definitely agree! I do think that Cargo's too high level for Buck's use case today, and I personally feel that rustc will almost always be too "low level" to get real benefit. As we continue to add features to Rust, the compiler, and Cargo, the idiomatic and most ergonomic way to consume these features will be through Cargo. For example macros 1.1 might be an absolute nightmare if you had to manage all the builds yourself, especially when cross compiling.

I personally think that rustc sits at the right level of abstraction for Cargo to be calling, so we wouldn't want to soup it up too much. Taking Cargo down to be a bit lower level though I think is where there's a lot of benefit to be had. With that in place we'd then design new language features with such a tool in mind to ensure the experience is smooth for everyone, Cargo users and "this lower level Cargo" users alike.


@davidzchen thanks for the input about Bazel! I'm curious on your thoughts about my comments above related to build scripts as well. Do you agree that compiler plugins (e.g. macros 1.1) are along the same vein as build scripts, or is one much easier to support in Bazel than the other? Or are they both very difficult to support?

@davidzchen
Copy link

davidzchen commented Feb 8, 2017

@alexcrichton Regarding compiler plugins and build scripts, the way I see it, the key difference between the two is that people write compiler plugins when they have a need to reuse functionality in the Rust compiler whereas people write build.rs to do anything they cannot do via the Cargo build system itself. As a result, compiler plugins are, in practice, used for much more niche use cases than build.rs.

One analogy that I can draw is that build.rs is similar to Bazel's genrule, which allows you to run any arbitrary shell command.

In any case, neither of these are difficult to support in Bazel:

  • For compiler plugins, Bazel has precedence for a feature like this: the java_plugin rule, which is used for running Java compiler plugins.
  • For build.rs, a quick and dirty solution would be to build it with a rust_binary rule and then run it in a genrule (and perhaps wrap these two in a rust_buildscript macro for ergonomics).

A main concern that I have with relying too heavily on build.rs is that what it runs is arbitrary and is (often) potentially non-hermetic. As a result, for projects that have a mix of different languages with interop between Rust and other languages, it would be better to rely more on the build system for this and have more fine-grained build targets and limit the use of build.rs as much as possible.

@jsgf
Copy link

jsgf commented Feb 8, 2017

@alexcrichton:

To clarify, I'm under the impression that this is a solve problem today

Mechanically it's solved because --frozen will prevent any attempts to download, and in practice I handle the whole problem by prebuilding all the parts of crates.io that our code needs. Is there an option to reverse this, so that all downloads are prohibited by default, and only allowed if there's an explicit option or command? Might be useful if not.

To clarify, this is from a build system perspective, not a language perspective, right?

Yes. We have C++ libraries that have their own complex dependency graphs managed by Buck that I'd like to add a Rust FFI bindings for, and make sure that everything gets rebuild properly. I'd also like to be able to expose Rust libraries to C/C++ code (mostly for things like python extensions), and again, make sure the build system knows all the deps. Trying to manage dependencies across build systems seems like it could be awkward.

I don't think it's out of the question though for Cargo to support custom caching though

Do you mean Cargo might be able to make use of Buck's cache? That poses lots of problems, not least because its unclear how Cargo would be able to compute the correct key. Buck's cache is indexed by both the immediate dependency (the source file contents), but also the keys of all its dependencies, with the goal of being able to skip as much of the dependency graph as possible. Cargo wouldn't have access to the information needed to either lookup or insert blobs into the cache.

Effectively Buck treats the compiler as a pure function of inputs -> output, and memoizes the result. If the build tool is more complex than that, then Buck can't cache its state well, and it complicates the interface to the build tool if its doing its own caching/memoization.

I haven't looked a sccache in detail, but this is quite different from how something like ccache works; ccache caches the results of individual compiles, but doesn't take the dependency graph into account.

I'm not sure I quite understand this constraint, so I wonder if we could dig in a bit? I definitely agree that a workspace may not scale to thousands of crates and projects, but the idea of a Cargo.toml certainly should, right?

Yeah, I was being pretty unclear.

Without workspaces, every binary cargo package has a dependency graph on other packages with library crates, and building that executable builds them all as needed. If multiple binaries share some or all of the same crates, then they all get rebuilt regardless.

You can use workspaces to effectively share the dependencies between multiple binary crates so that the library crates only get built once. But to achieve this, all the crates - binary and library - must be in a single workspace.

There's a few issues with this:

  • workspaces are strictly hierarchical, so effectively you need to configure a workspace to encompass the entire source tree
  • but the hierarchy is a single level, so all the packages have to be peers (at least logically)
  • the Cargo.lock file for the entire workspace might end up being massive if there's lots of dependencies, and it ends up being a flat dependency graph (I'm not actually sure if this is a problem)

I know there's been some discussion about loosening the constraints on workspaces to either allow nesting or have other relationships (esp with path dependencies), but its not clear to me they're the right way of modelling a complex dependency dag in a way to minimize building.

(Also I haven't really looked at workspaces in a while, so perhaps I'm completely out of date here, or just wrong.)

Yeah I can definitely understand how this may be nonstandard. I don't think this is something that can be sidestepped for too long, though, as a concept. Custom derive (macros 1.1) was stable in Rust 1.15, and that requires compiling a plugin a build-time to then run inside the compiler

I haven't looked at macros 1.1 yet, but are you saying that every crate that uses - say - serde will need to also build a compiler component, or is it just when serde itself is built? If its the latter then I can handle that when I pre-build the crates.io crates, and is basically no more difficult contraint than object files are compiler-version dependent.

If they need to be rebuilt for every user, then yeah, that's trickier.

The more general problem with build.rs is building an executable then running it as part of the build process. It's awkward to manage because it has unconstrained inputs and outputs (it can read and write arbitrary files) which means that it's opaque to the build system/dependency management.

Of the use cases listed in the docs, "Building a bundled C library", "Finding a C library on the host system" and "Performing any platform-specific configuration needed for the crate" are all pretty horrifying from a build integrity/reproducability perspective - they are strong antipatterns. The only one that makes any sense is "Generating a Rust module from a specification" (ie, generated source), but that could be done with a much more constrained interface (and perhaps macros 1.1 is that interface).

There's also the general security problem of just running random binaries on a build host that can do arbitrary things. It can be managed, but the less it happens the better.

I personally feel that rustc will almost always be too "low level" to get real benefit

rustc is about the right level for Buck, since its similar to gcc/javac/etc; certainly integrating at the rustc level (while not trivial) was more conceptually straightforward than trying to work out a conceptual mapping between buck and cargo.

What might be useful is:

  • "low-level cargo" provides an interface to a set of build recipes like "build me a .rlib", "build me an executable", "build me a compiler plugin", "tell me your dependencies/input sources"
  • a standardized format for dependency interchange, so that buck could emit - say - blob of json describing what it knows so that the cargo-like tool can make use of it, and perhaps vice-versa (this would be useful for other tools like RLS rather than relying on Cargo.toml/cargo directly)

Buck and Bazel are extremely similar in a lot of ways, so I expect that a solution that works for one will likely help with the other. As @davidzchen mentioned, the concept of a compiler plugin is very important for Java, so the Buck can also deal with it (since Java/Android is one of its primary use-cases); extending the concept to Rust is reasonably straightforward.

@petrochenkov
Copy link

This thread is delight to my eyes.
All the stuff about Cargo's scalability and integration I wanted to talk about, but didn't have enough factual evidence and practical experience. @jsgf @davidzchen thanks a lot for details!
Even if rustc+stdlib themselves are not especially large as a project, these issues show up at rustbuild level already, which favors "leverage Cargo and Rust as much as possible" over build system best practices. I hope this discussion will benefit it as well in the end.

@luser
Copy link

luser commented Feb 8, 2017

If we go with DAG-as-a-unit, is this sufficient? Can Cargo hook into existing caching infrastructure adequately? I believe this is how projects like Gecko work today where the whole Rust DAG is a unit and cargo is used to build it. This may have problems, however, if there are multiple Rust projects to link together (e.g. stylo and spidermonkey in Gecko both independently having Rust code)

In Gecko we are effectively limiting things to a single crate per output binary in the build system. We haven't crossed the "Spidermonkey requires Rust" bridge yet, but when we do we will probably just have it behind the existing "building the JS engine standalone" flag, and otherwise have that code pulled in via the crate that gets linked into libxul.

We've discussed this in other forums when we first ran into this issue, I know. The core problem was that when outputting something other than an rlib rustc includes support code such as jemalloc, and you can only link one copy of that into a binary.

@luser
Copy link

luser commented Feb 8, 2017

Mechanically it's solved because --frozen will prevent any attempts to download, and in practice I handle the whole problem by prebuilding all the parts of crates.io that our code needs. Is there an option to reverse this, so that all downloads are prohibited by default, and only allowed if there's an explicit option or command? Might be useful if not.

We've discussed this before, but no, there's not. I have a cargo issue open for making our Gecko use case nicer. For Gecko we currently vendor all our crates.io dependencies with cargo vendor, and use a cargo config file to enable source replacement.

I haven't looked a sccache in detail, but this is quite different from how something like ccache works; ccache caches the results of individual compiles, but doesn't take the dependency graph into account.

This isn't implemented yet (I'm working on it this quarter), but we're planning on making sccache able to cache rust compilations at the crate level. There's a good writeup of the plan here. This is different from ccache, which operates at the object file level, but Rust compilation is fundamentally different from C compilation.

@alexcrichton
Copy link
Member

Ok I had some discussion with @aturon the other day and we felt that the discussion here has led to at least two concrete courses of action to start tackling this issue. To that end I've opened up two new issues on the Cargo issue tracker:

Our thinking is that specific technical discussion about those two issues should move towards those issues specifically. Otherwise more general discussion should of course continue here!

@eternaleye
Copy link

eternaleye commented Mar 13, 2017

One concern I have is that the "build plan", as described, sounds like a "concrete build plan" - that is, one in which things like Cargo features have been bound, rather than remaining free.

This is not compatible with some distros. In particular, on Exherbo, it's desirable to surface the Cargo features of crates as Exheres options on the corresponding packages. As a result, generating Exheres from crates requires something closer to Haskell's GenericPackageDescription (where flags are free) than its PackageDescription, in which they are bound.

This corresponds to a somewhat different set of phases - the ones in the prior listing have been italicized:

  • crate dependency management
  • a build specializer
  • a build planner
  • a build execution engine

The new phase is performed by the system package manager, and this results in a very different workflow. The system PM thus asks Cargo to extract a data structure that has some number of free parameters that, when all are bound, can be queried for a build plan.

In this system, Cargo is used to extract this data structure. This data structure is then converted to a system package, which then has those free parameters bound by the system package manager (based on end user configuration). The system package manager then constructs a download plan (of crate sources, though the system PM would much prefer if there was a stable ABI, and it could thus construct the inter-crate build plan entirely by itself), and then invokes Cargo to plan and execute the build. The download plan and the Cargo build plan being mismatched is an error.

As a result, unlike the above buildsystem use cases (which basically call in to Cargo), Exherbo's use case basically wants to bookend its own model with exporting information from Cargo, and then reinjecting it to Cargo.

@softprops
Copy link

Big fan of official active and continued support for rust in bazel! This would be very timely for my company.

I'm working making the switch from sbt to bazel for one our most active repos and at the same time introduce curious engineers to rust. For engineers being onboarded to a new language within an organization it helps to use a familiar toolchain.

That said cargo is fantastic tool and does many things better for certain tasks. This is probably mentioned above but cargo is more than a build tool, it's also a package manager. Bazel is just a build tool and it really excels and just doing that. Losing the package manager aspect is a real drag after having used it. It would be great if cargo could split out the dependency management aspect into a standalone tool which the bazel rules for rust could use

@dmitry-timofeev
Copy link

I'm not sure if there is any interest in this or not, but I've been working on Gradle plugins that use Cargo under the hood to build Rust code.

@sholsapp, there certainly is, would love to try it out. It will be useful when building apps that have modules written in various languages (e.g., Java + Rust communicating over JNI or IPC).

Stephen, could you please let me know if you still plan to open-source your Gradle plugin?

@Firstyear
Copy link

Hi there. I would like to comment on this, as I have been working to integrate Rust into the existing 389 Directory Server project. We use gnu autotools as our build system, and subsequently have a strong requirement to operate with rpm and across a number of other platforms.

I think the summary of my position is that cargo is too "opinionated" about builds and projects. Cargo wants to control your source code, output targets and dependcies, and that conflicts most build systems.

Cargo will often attempt to download online (even with --frozen - this ruined some live demos for me actually). It also has strict ideas about output of artifacts, especially .so or binaries. A simple task like "cargo, build this bin, and output to this location and name" is difficult to achieve.

Cargo is a great tool, and awesome for green field applications, but it's suitability to integrate to other tools is just not there. To make cargo work as a child process to autotools or others, would require stripping cargo down significantly or adding so many switches and extra complexities to it's build process, that it may in turn become too fragile or hard to use.

For example, a simple requirement to "cargo install" an object or dylib to a location, is not possible today, because cargo's focus is 100% around rlibs or the final resultant executable. For example, we build a number of dylibs that are then installed and dlopened at run time. We can't use cargo to build these .so. as a result, so we produce .a and .o, then use autotools to finish them and install. This process is already pretty fiddly, and often causes problems. For example, when you edit a source file, autotools doesn't know to re-make it (so you have to mark all targets as PHONY).

Another subtle issue is that when we make a -release build we need debug symbols to remain so that rpm can extract them for debuginfo tools, but cargo is "-release" or "-debug". Again, this is because cargo has opinions of how it should be used.

As another example, cargo does duplicate work. Consider I have 3 plugins:

plugin_a
    \- cargo.toml
plugin_b
    \- cargo.toml
plugin_c
    \- cargo.toml

Were I to build all of these in my application, they would all pull their rlib deps individually, and would not share the output or work. So my other options is make cargo at the root, and have various specifications of libs and build targets, (which comes with it's own complexity). This then goes back to cargos "opinion". It wants to control the whole build, rather than allowing smaller translation units.

In the end, I have chosen to stop using cargo, and likely will not utilise it in projects.

I have solved this through the use of git submodules to check out crate dependencies, and careful use of rustc. You can see my example of this as:

RUSTC_FLAGS = @asan_rust_defs@ @debug_rust_defs@ -L $(abs_builddir)/.rlibs/

rlibs:
	-mkdir -p $(abs_builddir)/.rlibs/

.rs.a :
	rustc $(RUSTC_FLAGS) --crate-type staticlib --emit link -o $@ $<

.rs.o :
	rustc $(RUSTC_FLAGS) --crate-type cdylib --emit obj -o $@ $<

libdylib.rlib: rlibs
	rustc $(RUSTC_FLAGS) --crate-type rlib --crate-name dylib --cfg 'feature="libc"' -o $(abs_builddir)/.rlibs/$@ external/dylib/src/lib.rs

liblibrary.rlib: libdylib.rlib rlibs
	rustc $(RUSTC_FLAGS) --crate-type rlib --crate-name library -o $(abs_builddir)/.rlibs/$@ library/src/lib.rs

...

lib_LTLIBRARIES = 	liblibrary.la \
					libplugin_r.la

liblibrary_la_LIBADD = library/src/lib.a
am_liblibrary_la_OBJECTS = library/src/lib.o
liblibrary_la_SOURCES = ""

libplugin_r_la_LIBADD = plugin_r/src/lib.a
am_libplugin_r_la_OBJECTS = plugin_r/src/lib.o
libplugin_r_la_SOURCES = ""
libplugin_r_la_DEPENDENCIES = liblibrary.rlib

I still have the issue with make not knowing when to recompile, but this makes a much clearer and cleaner build for the project. With minimal effort I can now build many .so, they can share rlibs, I do not have to define a make target per object. As well, I gain the ability that autotools has to install shared objects to defined locations.

I hope this helps a bit, but I think that rather than investing in cargo to build rust ubiquitously, why not make cargo the build system for greenfields, and "improve" rustc to work for integration to build systems? This would give you the ability to balance the two, where you can have a feature rich system in Cargo, but the required integration components in rustc that external build tools need.

Thanks,

@ndenev
Copy link

ndenev commented Jun 23, 2017

@Firstyear I think some of your issues can already be solved (most of this is in Cargo's docs):

  • Use cargo-vendor to pull and save all remote dependencies to disk.
  • To produce a .so file, add this under your Cargo.toml's [lib] section:
    crate-type = ["cdylib"]
  • Set this in your Cargo.toml to add debug symbols to a release optimized binary/lib:
    [profile.release]
    debug = true

@Firstyear
Copy link

I'll answer these points

  • cargo-vendor is new to me, so I will investigate it to see if it helps resolve the issues we face in distribution of sources.
  • While cargo can produce a .so. we are using autotools - autotools can not handle an external .so being created. It requires .a and .o files, which limits cargo and makes the invocation quite complex to use. To make it worse, lets say we did use cargo to make .so - there is no way for cargo to install that .so to a meaningful location (make install style), or to cleanly and properly.
  • that is a helpful flag for the release profile to know about.

My points about the opinionated nature of cargo still stand however, it feels like I'm fighting cargo to make it do something it doesn't want to - integration with rustc standalone was much easier for us with our existing sources.

Let me put the response a different way: why do we insist on building rust with cargo? why not make rustc a tier 1 build tool of it's own right, and help to improve the rustc build story alongside cargo?

Thanks,

@eternaleye
Copy link

eternaleye commented Jun 23, 2017

@Firstyear:

autotools can not handle an external .so being created.

Hm? How so? My first instinct would be to put the following in Makefile.am:

LIBRARIES := libfoo.so
PROGRAMS := hello

libfoo.so: rust-foo/src/*.rs rust-foo/Cargo.toml rust-foo/Cargo.lock rust-foo/build.rs
    cd rust-foo && cargo build
    cp -a SO_LOCATION libfoo.so

hello_LDADD := libfoo.so
hello_DEPENDENCIES := libfoo.so

As noted in the manual:

As far as rules are concerned, a user-defined rule overrides any automake-defined rule for the same target.

As a result, you have the full power of make available to you for producing libfoo.so, and being declared in LIBRARIES allows Automake to take care of the install rules, etc.

@acmcarther
Copy link

My 2c, having spent the last very long time fighting with what you're fighting with @Firstyear, but for a different build tool....

It actually gets worse -- many crates are super coupled to Cargo. Once you solve the vendoring problem, and the Makefile code generation problem for normal crates, you also will get the fun of handling

  • Cargo env vars (CARGO_blah, OUT_DIR, and more)
  • Build.rs files (as a thing)
  • Build.rs helper crates that expect build.rs binary to be in a specific place, with source files in specific places
  • build.rs binaries emitting special cargo-only strings that cargo uses for build planning (cargo:rustc-link-lib=static=foo)

I consider the openssl crate my final boss, as it uses all of these features to tremendous effect. Take a look at that and see how you might try compiling it without Cargo.

@DemiMarie
Copy link

Here are my thoughts:

  • Cargo needs to be able to produce .a and .o files.
  • cargo build should be split into two steps that can be run separately:
    1. Fetch all artifacts from crates.io. Store them to a user-specified location.

      This is the “package manager” part of Cargo that @Firstyear identified.

    2. Deterministically compile these artifacts. This can be done either by cargo, or by rustc itself.

@Firstyear
Copy link

Cargo can already create .a and .o, the issue is that for autotools projects we need .la and .lo. Without these, we get a lot of complaints.

As well, because autotools doesn't know "what changed", it doesn't know when to trigger the build (arguably, you can make the target .PHONY and let cargo handle it.). Cargo-vendor may alleviate the dist pain issues though.

@Firstyear
Copy link

Firstyear commented Jul 14, 2017

@eternaleye That looks like a nice solution but it doesn't work: autotools doesn't like the raw .so, and doesn't know how to feed them to the linker correctly, so it fails to build :( need to produce .a and .o still sadly.

One of the biggest challenges is getting cargo to know when to rebuild source. autotools only knows that there is a ".a" or ".o", but not what constitutes them. So as a result, you have to make clean to trigger cargo to actually rebuild.

You can't add the target to .PHONY because then you end up building the targets multiple times pointlessly (I saw 8 builds in a complex environment here).

A subtle gripe is the output:

warning: due to multiple output types requested, the explicitly specified output file name will be adapted for each output type

Which no matter what options or changes I make (even just a single -o target) I can never manage to stop it. Similar, when you use staticlib, you get a very very verbose warning about links and no apparent way to quiet it.

@jpakkane
Copy link

There really should only be one build system that invokes compilers such as rustc directly. Having a build tool call into a different build tool (Cargo in this case) is flaky. Trying to force every build system in the world to go through Cargo rather than rustc directly is not going to work, sadly.

@rillian
Copy link

rillian commented Jul 14, 2017

You're free to re-implement cargo in your own build system, of course. I tried that for a bit, but decided it was too much work. But those are the two choices really, calling cargo or re-implementing it, unless you want to maintain a custom build description for all of your rust-language dependencies.

In Firefox we do use the cargo vendor extension and then invoke cargo build --frozen to separate the package download step from the typical bulid. Updating is a bit fiddly but no worse than other package systems I've used.

A no-op cargo build takes a bit longer than make but it's still faster than compiling anything.

warning: due to multiple output types requested, the explicitly specified output file name will be adapted for each output type

I believe you can get rid of this warning by specifying all the output filenames directly. E.g. drop the -o option, pass --crate-name, and rely on standard output name generation. Or specify names explicitly like --emit=dep-info=foo.d,obj=foo.o,link=libfoo.a.

You can parse the dependent library output for a staticlib and feed it back into your eventual link line. This needs to be dynamic since it can be affected by conditional compilation. It would be nice if the compiler could write that to a file instead.

@EelcoHoogendoorn
Copy link

Being quite new to rust I may be missing many nuances in this discussion; but coming from python it strikes me that it is the conda (binary) package management that is sorely lacking here. Not that conda is not python-specific but language-agnostic, and also used for quite some C++ projects nowadays. The main 'problem' as it strikes me is that it has quite a bit of overlap with the functionality of cargo; but is there a way for these two tools to put their egos aside and combine elegantly in some manner?

@rillian
Copy link

rillian commented Aug 23, 2017

Is it just the distribution of binary packages (as well as source, like cargo does) that you're interested in, e.g. to save waiting on build times? Or is there something else conda offers which cargo doesn't? I've never used conda, so it would help be understand what you mean if you could offer more details.

It would certainly be possible to add a binary caching layer to cargo or crates.io. Possibly by adding some kind of sccache integration to cargo. I think it hasn't been a priority so far because every rust release has an ABI change, but as applications written in rust get more complex and popular being able to install without the compile step gets more interesting. I believe there's also a few more steps to get to reproducible builds, which are important for verifying distributed binaries.

@EelcoHoogendoorn
Copy link

Build times are one thing; the last python project I worked on had 2gb of binary dependencies; building all of that from source would have been no fun whatsoever; perhaps bordering on practically infeasible.

But its mostly the fact that I end up getting dragged into more internal compilation details than I was used to with conda; for instance, the need to have a fortran compiler installed and configured and whatnot. I wasnt planning on getting involved into any of that; I just wanted to use netlib as a dependency. That kind of 'leaking' of concerns across package boundaries isnt going to scale to large and complex dependency graphs as well as conda did.

So its not so much the 'distribution' of the binaries that I think is the crucial part (though a considerable convenience); but rather it is the pushing of encapsulation of nontrivial concerns as far as possible which I think makes a large difference for the scalability of an ecosystem.

@rillian
Copy link

rillian commented Aug 23, 2017

I see. It's more about needing a way for crates with build.rs components to package the additional environment components they need for each platform. That's a good point.

@aturon
Copy link
Member Author

aturon commented Sep 1, 2017

An eRFC on this topic is now up!

@Firstyear
Copy link

Okay, so since then I want to refine this: I have some success with .so for binaries, but trying to link two libraries is an issue.

We have a plugable system which generates .so which we dlopen, and those libraries link to other libraries. For example:

libuiduniq.so -> libslapd.so.

Now the issue I face is that we have a library for datastructures which is librsds.so - the library I have a rust poc in. The issue is linking libuiduniq.la to this can not be done. It's quite challenging to make this configuration play nicely.

Second - .so versioning. If we use cargo to make the .so, we lose the ability to provide these versions in our .so files, so having cargo output resources that can be used by autotools for .la, and allowing libtool to create the .so will really ease the process of integration.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests