Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: a zlib plugin #2244

Closed
juntao opened this issue Jan 31, 2023 · 17 comments
Closed

feat: a zlib plugin #2244

juntao opened this issue Jan 31, 2023 · 17 comments
Labels
c-Plugin An issue related to WasmEdge Plugin feature help wanted Extra attention is needed LFX Mentorship Tasks for LFX Mentorship participants

Comments

@juntao
Copy link
Member

juntao commented Jan 31, 2023

Motivation

The zlib is required for compiling and running many existing C / C++ / Rust apps in Wasm. Most noticeably, it is required for the Python port to Wasm. The VMWare Wasm Labs team is using a zlib port from Singlestore in their Python Wasm runtime.

In WasmEdge, we could support the zlib host functions through our plug-in system. This way, any existing zlib apps can be compiled to Wasm and runs inside WasmEdge. The immediate benefits of this approach are three folds:

  1. By using native zlib libraries, we could improve the performance of zlib apps in wasm, especially in the interpreter mode.
  2. It is significantly easier for developers to just compile their programs to wasm, as opposed to having to compile zlib itself to wasm and then link with their apps.
  3. There is no need for developers to register host functions for zlib in their apps. The plug-in does it from inside WasmEdge.

Since we are supporting zlib from within WasmEdge, we are not introducing another host app to wrap around WasmEdge. WasmEdge remains the "container" of the app. The zlib apps would be able to seamlessly run in WasmEdge embedded in Docker Desktop, Kubernetes, containerd, OpenShift, and other container tools.

Details

Create a plug-in for zlib host functions

Apply here

https://mentorship.lfx.linuxfoundation.org/project/74cecdf7-e886-4830-8bb0-7814f0d1aa2d

@juntao juntao added help wanted Extra attention is needed feature c-Plugin An issue related to WasmEdge Plugin labels Jan 31, 2023
@lengrongfu
Copy link

As newbie to WasmEdge, can contribute this plugin?

@juntao
Copy link
Member Author

juntao commented Feb 1, 2023

As newbie to WasmEdge, can contribute this plugin?

Of course! You do need to have a good understanding of C++. Please read this doc first: https://wasmedge.org/book/en/plugin.html

@lengrongfu
Copy link

Sorry, I thought the plugin was written in rust; as I have no experience with this in C++. Please others continue to contribute.

@hydai hydai added the GSoC Tasks for Google Summer of Code participants label Feb 7, 2023
@vibhu1805
Copy link

@juntao ,I do have a good understanding of C++.Can you provide me resources to complete this issue as I am newbie to Wasmedge .

@juntao
Copy link
Member Author

juntao commented Feb 28, 2023

Sure. See the following SDKs for creating WasmEdge plugins:

C++: https://wasmedge.org/book/en/plugin.html

Rust: https://github.com/second-state/wasmedge_plugin_rust_sdk

@littledivy
Copy link

littledivy commented Mar 10, 2023

@juntao Hi, i'd like to work on this using the Rust SDK. IIUC the new plugin would be in
https://github.com/WasmEdge/WasmEdge/tree/master/plugins with an isolated dylib build system?

@juntao
Copy link
Member Author

juntao commented Mar 10, 2023

Hi @littledivy

Would love to see your contribution! Yes, we would like this to be an "official" plugin. So, let's put it in the official repo like other plugins.

@littledivy
Copy link

Hey @juntao

I was able to setup a Rust plugin and some zlib functions work in this POC: https://github.com/WasmEdge/WasmEdge/compare/master...littledivy:WasmEdge:zlib_plugin?expand=1

I have a question, are we planning to expose the raw zlib C API to WASM?

If so, how are pointers supposed to be handled across the WASM <-> Host memory boundary? and ofcourse the struct layout of the 64-bit host is different than 32-bit WASM code. My POC does some very hacky/unsafe WASM to host struct layout copy-conversion...which I think isn't ideal.

A workaround is to not expose the exact zlib API and write wrapper plugin (something like Node's zlib module) but that won't be a drop-in replacement for exisiting apps.

@juntao
Copy link
Member Author

juntao commented Mar 11, 2023

Hi @littledivy

Thanks! I think we need a rust crate for zlib. When it is compiled into Wasm, the bytecode will call your host functions. Maybe similar to this

https://crates.io/crates/libz-sys

An example is the WasmEdge WASI socket crate, which provides Wasm access to the socket-related host functions.

https://crates.io/crates/wasmedge_wasi_socket

@littledivy
Copy link

littledivy commented Mar 11, 2023

@juntao Yup understood, I'm using the zlib_sys crate. However, I realised my question wasn't clear, there's a problem with this approach:

// WASM bytecode
let z_stream = unsafe { zeroed() };

// Calls into Host function
deflateInit_(&mut z_stream, ...);
// Host function
fn deflateInit_(
  frame: CallingFrame,
  inputs: Vec<WasmValue>,
) -> Result<Vec<WasmValue>, HostFuncError> {
  let strm = get_frame_pointer(&frame, inputs[0]);

  zlib_sys::deflateInit_(strm, ...);
  // ...
}

The host's z_stream (64bit) is different from what the WASM bytecode (32bit) assumes. This code will crash / produce UB.

WASI sockets are designed to not involve sharing raw pointers between host and wasm bytecode, hence this is not a problem there.

@hydai
Copy link
Member

hydai commented Mar 13, 2023

How about separating a 64-bit type into two 32-bit types? Concatenate them in the host function side and then do the decode.

@littledivy
Copy link

@hydai I'm not sure I understand. How would that look like for the deflateInit example? It's also the layout of the struct z_stream thats entirely different for WASM and host.

@hydai
Copy link
Member

hydai commented Mar 15, 2023

Hi @littledivy
Sorry, I'm also getting confused :-(

Could you please explain the details of The host's z_stream (64bit) is different from what the WASM bytecode (32bit) assumes. This code will crash / produce UB.?

  1. Which part will be generated to WASM?
  2. Which part will be the host function?
  3. Which part will be the WasmEdge rust binding?
  4. What types of each class?

@NeelDigonto
Copy link
Contributor

NeelDigonto commented Apr 3, 2023

I have setup a minimal but complete zlib deflate and inflate test program and ran it successfully with WasmEdge C SDK with necessary host functions, and after further design decisions we can transition into an official plugin for the same.

Test Host & Module Implementation

Now, I will discuss what important factors, assumption and design decision needs to be discussed about.
I am writing down how I approched this issue and asking for opinions or open discussions at each step.

My approach / plan:

  1. Learn about the zlib library.
    - Found out that madler/zlib is currently in maintenance mode and not accepting major changes (faced a problem with incorrect CMakeLists and no action on the issue for multiple years).
    - Choose zlib-ng as best suitable candidate.
    - Followed a official zlib guide (https://www.zlib.net/zlib_how.html) and test ran the zpipe.c, which basically copies data from a file chunk by chunk, compresses it with deflate and decompresses it similarly with inflate chunk by chunk.
    - Setup the wasmedge and emsdk libraries.

  2. The PLAN:

    • We will try to make sure that the zlib application requires minimal to no code changes to work under this plugin.
    • Our plan is to use the host Zlib implementation to do the heavy lifting.
    • In zlib the z_stream struct hold most of the information about the data to compress and decompress along with an internal state of zlib.
    • Our plan is to use the wasm z_stream as a dummy state and sync the host and wasm z_stream before and after any zlib calls.
    • The wasm module will include the zlib header but not link against the library, so we will end up with a few function imports.
    • We would like to compile our c++ wasm code using emscripten as a MAIN_MODULE, so we will need to pass the flag -sWARN_ON_UNDEFINED_SYMBOLS=0 to supress unresolved externs in the main module.
    • Now we have roughly 4 important issues left to discuss, memory layout, how to handle multiple z_stream, how to handle the zlib internal state, and how to call wasm's custom zmalloc and zfree routines from host and deal with referencing the memory in the wasm space from host zlib.
    • About the zlib internal state, I believe we don't need to expose it or sync it to wasm from host.
      In zlib.h LINE#116, it clearly written struct internal_state *state; /* not visible by applications */. And I have also checked the cpython repo cython/Modules/zlibmodule.c, for any hacky reference to zlib's internal state and I could find none.
    • Now that we have eliminated the need to expose zlib's internal state to wasm module, we have no use of zmalloc and zfree which were solely used for the zlib internal state. We can simply ignore the wasm module's zmalloc and zfree
      and initialize the host zmalloc and zfree with Z_NULL, or whatever we choose to and we will have complete freedom on it. The only side effect I can think of is that, in cpython PyMem_RawMalloc won't be called and maybe the python memory usage stat's will be a bit lower, but nothing to worry about I guess.
    • Now about the multiple z_stream issue, which will occur if we have multiple zlib instances being used at once in the wasm module, I hope we can simply solve this with a lookup table. For example we can have std::unordered_map<uint32_t, z_stream *> stream_map;, where the key is the wasm memory offset, and value is the host z_stream object pointer. This is the approach I am currently using, and will see if it causes any problem while porting the code to a plugin. No smart pointers for now. Any inputs here will be very helpful.
    • Now the most important part, the memory layout.
    • The concerns are z_stream struct layout, padding & alignment, endianness and byte representation like 2's complement and LEB128.
    • From my research wasm is and will only be little endian, and almost all servers and desktops are little endian so it's not a big problem, even though in my sample I check the machines endianness with an assert.
    • Now about the z_stream struct padding and alignment issue, we need to talk about this because most of the host will be a 64bit server or processors running WasmEdge, and wasm is strictly 32bit utill memory64 proposal drops in.
      Assuming the wasm is 32bit which it most likely will be we can define a struct on the host side like this ->
      struct wasm_z_stream {
      uint32_t next_in;
      uint32_t avail_in;
      uint32_t total_in;
      
      uint32_t next_out;
      uint32_t avail_out;
      uint32_t total_out;
      
      uint32_t msg;
      uint32_t state;
      
      uint32_t zalloc;
      uint32_t zfree;
      uint32_t opaque;
      
      int32_t data_type;
      
      uint32_t adler;
      uint32_t reserved;
      }; // 56 bytes
    • Here every item is aligned on 4byte boundaries, we do this because we know the z_stream struct beforehand and we know its memory layout when compiled by llvm backends or any standard abiding toolchains WASM BasicCABI. The pointers are 4 bytes in size in wasm.
    • There is an issue though with rust, which might or might not have been resolved rustc c abi.
    • But for emscripten/llvm its all clear.
    • Now we come to 2's complement, I am assuming the host is 2's complement, and wasm is guarrenteed to be 2's complement by default.
    • All this leads to the conclusion that I can simply get a pointer to a struct in wasm linear memory and access it with my handy dandy struct wasm_z_stream, defined above.
    • With these I have covered all the issues that came to my mind, and I am open to discussion of any other gotchas or pitfall or design design that needs to be taken care of.
  3. WASM Zlib Code.

    • CODE : Module.cpp
    • GOAL: To achieve a complete deflateInit -> deflate -> deflateEnd -> inflateInit -> inflate -> inflateEnd cycle.
    • I simplified the zpipe.c and removed file read & write, single chunk input buffer read from memory for later benchmark, and also wrote a more C++ish version of the inflate and defalte routines.
    • Cython provides custom malloc and free to zlib for its internal state allocation.
      I do the same with
      void *custom_malloc(voidpf opaque, uInt items, uInt size) {
      
        auto add = malloc(items * size);
      #ifndef __EMSCRIPTEN__
        std::cout << "zalloc : " << add << " = " << items * size << std::endl;
      #endif
        return add;
      }
      
      void custom_free(voidpf opaque, voidpf address) {
      #ifndef __EMSCRIPTEN__
        std::cout << "zfree : " << address << std::endl;
      #endif
        return free(address);
      }
    • But since I have mention in point 2., that the host will ignore these and assign Z_NULL.
    • Rest is fairly standard C++, but you might be irritated by the generous #ifndef __EMSCRIPTEN__.
    • I used these to remove console output and avoid c++ exception while compiling under emscripten.
    • The int test() function is by the WasmEdge host..
    • Under normal compilation with gcc the main function calls the test() function.
    • The test case and data generation code:
      #ifdef __EMSCRIPTEN__
      #define PRESERVE EMSCRIPTEN_KEEPALIVE
      #else
      #define PRESERVE
      #endif
      
      static constexpr size_t DATA_SIZE = 1 * 1024 * 1024;
      static constexpr size_t BUFFER_SIZE = 16'384; // 16 * 1024
      
      constexpr auto randChar = []() -> char {
        constexpr char charset[] = "0123456789"
                                  "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                                  "abcdefghijklmnopqrstuvwxyz";
        constexpr size_t max_index = (sizeof(charset) - 1);
        return charset[rand() % max_index];
      };
      
      extern "C" int PRESERVE test() {
        std::vector<char> data(DATA_SIZE, {});
        std::generate_n(std::begin(data), DATA_SIZE, randChar);
      
      #ifndef __EMSCRIPTEN__
        std::cout << "Compressing Buffer of size : " << DATA_SIZE << "B" << std::endl;
      #endif
        const auto compressed_buffer = Deflate(data, 6);
      
      #ifndef __EMSCRIPTEN__
        std::cout << "Decompressing Buffer of size : " << compressed_buffer.size()
                  << "B" << std::endl;
      #endif
        const auto decompressed_buffer = Inflate<char>(compressed_buffer);
      
        auto comp_res = data == decompressed_buffer;
      #ifndef __EMSCRIPTEN__
        std::cout << (comp_res ? "Success" : "Fail") << std::endl;
      #endif
      
        return comp_res;
      }
      
      int main() {
        test();
        return 0;
      }
    • Now the deflate routine:
      template <typename T>
      std::vector<unsigned char> Deflate(const std::vector<T> &source,
                                        int level = -1) {
    
        int ret, flush;
        z_stream strm;
        ret = InitDeflateZStream(strm, level);
        const std::size_t src_size = source.size() * sizeof(T);
        std::size_t out_buffer_size = src_size / 3 + 16;
        std::vector<unsigned char> out_buffer(out_buffer_size, {});
    
        strm.avail_in = src_size;
        strm.next_in = reinterpret_cast<unsigned char *>(
            const_cast<std::remove_const_t<T> *>(source.data()));
        strm.avail_out = out_buffer.size();
        strm.next_out = out_buffer.data();
    
        do {
    
          if (strm.avail_out == 0) {
            const std::size_t extension_size = src_size / 3 + 16;
            strm.avail_out = extension_size;
            out_buffer.resize(out_buffer_size + extension_size, {});
            strm.next_out = std::next(out_buffer.data(), out_buffer_size);
            out_buffer_size += extension_size;
          }
    
          ret = deflate(&strm, Z_FINISH);
    
      #ifndef __EMSCRIPTEN__
          if (ret == Z_STREAM_ERROR)
            throw std::runtime_error("Zlib Stream Error!");
      #endif
        } while (ret != Z_STREAM_END);
    
        deflateEnd(&strm);
        out_buffer.resize(out_buffer_size - strm.avail_out);
    
        return out_buffer;
      }
    • The Inflate routine:
      template <typename T>
      std::vector<T> Inflate(const std::vector<unsigned char> &source) {
    
        int ret, flush;
        z_stream strm;
        ret = InitInflateZStream(strm);
        const std::size_t src_size = source.size();
        std::size_t out_buffer_size = src_size / 3 + 16;
        std::vector<unsigned char> out_buffer(out_buffer_size, {});
    
        strm.avail_in = src_size;
        strm.next_in = const_cast<unsigned char *>(source.data());
        strm.avail_out = out_buffer.size();
        strm.next_out = out_buffer.data();
    
        do {
    
          if (strm.avail_out == 0) {
            const std::size_t extension_size = src_size / 3 + 16;
            strm.avail_out = extension_size;
            out_buffer.resize(out_buffer_size + extension_size, {});
            strm.next_out = std::next(out_buffer.data(), out_buffer_size);
            out_buffer_size += extension_size;
          }
    
          ret = inflate(&strm, Z_FINISH);
    
      #ifndef __EMSCRIPTEN__
          if (ret == Z_STREAM_ERROR)
            throw std::runtime_error("Zlib Stream Error!");
      #endif
        } while (ret != Z_STREAM_END);
    
        inflateEnd(&strm);
        out_buffer_size -= strm.avail_out;
    
        std::vector<T> ret_buffer(reinterpret_cast<T *>(out_buffer.data()),
                                  std::next(reinterpret_cast<T *>(out_buffer.data()),
                                            (out_buffer_size / sizeof(T))));
    
        return ret_buffer;
      }
    • With gcc:
        fathomless@vividecstasy:~/repo/wasmedge-zlib/src$ g++ -O2 module.cpp -o module -lz && ./module
        Compressing Buffer of size : 1048576B
        zalloc : 0x563ffacb22c0 = 5952
        zalloc : 0x563ffacb3a10 = 65536
        zalloc : 0x563ffacc3a20 = 65536
        zalloc : 0x563ffacd3a30 = 65536
        zalloc : 0x563fface3a40 = 65536
        zfree : 0x563fface3a40
        zfree : 0x563ffacd3a30
        zfree : 0x563ffacc3a20
        zfree : 0x563ffacb3a10
        zfree : 0x563ffacb22c0
        Decompressing Buffer of size : 788616B
        zalloc : 0x563ffacb22c0 = 7160
        zalloc : 0x563ffacf41b0 = 32768
        zfree : 0x563ffacf41b0
        zfree : 0x563ffacb22c0
        Success
  4. The Host code explanation:

    • SRC: host.cpp
    • Step by step breakup of my host code.
    • On entering I assert that I am on a little endian machine, it can be removed because little endian is universally used nowadays.
      struct Util {
        std::unordered_map<uint32_t, z_stream *> stream_map;
      };
    
      int main() {
        if (!isLittleEndian())
          throw std::runtime_error("Will support Big Endian Later.");
    
        Util util; 
    • Now I setup the WasmEdge VM
         WasmEdge_ConfigureContext *ConfCxt = WasmEdge_ConfigureCreate();
         WasmEdge_ConfigureAddHostRegistration(ConfCxt,
                                               WasmEdge_HostRegistration_Wasi);
         WasmEdge_VMContext *VMCxt = WasmEdge_VMCreate(ConfCxt, NULL);
    
         WasmEdge_String ExportName = WasmEdge_StringCreateByCString("env");
         WasmEdge_ModuleInstanceContext *HostModCxt =
             WasmEdge_ModuleInstanceCreate(ExportName);
    • Now the memory module, since I compile with sIMPORTED_MEMORY -sINITIAL_MEMORY=128MB -sALLOW_MEMORY_GROWTH=0, I use a page size of 16*128 = 128MB
         WasmEdge_Limit MemLimit = {
             .HasMax = true, .Shared = false, .Min = 16 * 128, .Max = 16 * 128};
         WasmEdge_MemoryTypeContext *HostMType = WasmEdge_MemoryTypeCreate(MemLimit);
         WasmEdge_MemoryInstanceContext *HostMemory =
             WasmEdge_MemoryInstanceCreate(HostMType);
         WasmEdge_String MemoryName = WasmEdge_StringCreateByCString("memory");
         WasmEdge_ModuleInstanceAddMemory(HostModCxt, MemoryName, HostMemory);
         WasmEdge_MemoryTypeDelete(HostMType);
    • Now lets jump to how I call the wasm module
         RegisterHostFunction("inflateEnd", WasmEdge_ZlibEnd<inflateEnd>,
                             {WasmEdge_ValType_I32}, {WasmEdge_ValType_I32}, &util,
                             HostModCxt);
    
         WasmEdge_VMRegisterModuleFromImport(VMCxt, HostModCxt);
    
         WasmEdge_String EntryPoint = WasmEdge_StringCreateByCString("test");
         WasmEdge_Value EntryPointParams[0], EntryPointReturns[1];
         WasmEdge_Result Res =
             WasmEdge_VMRunWasmFromFile(VMCxt, "./module.wasm", EntryPoint,
                                       EntryPointParams, 0, EntryPointReturns, 1);
         if (WasmEdge_ResultOK(Res)) {
           const auto test_res = WasmEdge_ValueGetI32(EntryPointReturns[0]);
           printf("Test Result : %s\n", test_res ? "Success" : "Failed");
         } else {
           printf("Error message: %s\n", WasmEdge_ResultGetMessage(Res));
         }
    
         /* Resources deallocations. */
         WasmEdge_VMDelete(VMCxt);
         WasmEdge_ConfigureDelete(ConfCxt);
         WasmEdge_StringDelete(EntryPoint);
         return 0;
       }
    • Now coming to the host function registration, we have
    deflateInit_ // deflateInit is actually a zlib macro which calls deflateInit_
    deflate
    deflateEnd
    inflateInit_ // similar story as deflateInit_
    inflate
    inflateEnd
  • We create a function to ease registering our WasmEdge Host Functions.
   static void
   RegisterHostFunction(const std::string &_function_name,
                       WasmEdge_HostFunc_t _func_pointer,
                       std::vector<WasmEdge_ValType> _params_list,
                       std::vector<WasmEdge_ValType> _return_list, Util *_util,
                       WasmEdge_ModuleInstanceContext *_module_context) {
     WasmEdge_String HostFuncName =
         WasmEdge_StringCreateByCString(_function_name.c_str());

     WasmEdge_FunctionTypeContext *HostFType =
         WasmEdge_FunctionTypeCreate(_params_list.data(), _params_list.size(),
                                     _return_list.data(), _return_list.size());
     WasmEdge_FunctionInstanceContext *HostFunc =
         WasmEdge_FunctionInstanceCreate(HostFType, _func_pointer, _util, 0);
     WasmEdge_ModuleInstanceAddFunction(_module_context, HostFuncName, HostFunc);
     WasmEdge_FunctionTypeDelete(HostFType);
     WasmEdge_StringDelete(HostFuncName);
   }
  • I am skipping registration of host functions here and directly jumping to their defination.
    WasmEdge_Result
    WasmEdge_deflateInit_(void *Data,
                          const WasmEdge_CallingFrameContext *CallFrameCxt,
                          const WasmEdge_Value *In, WasmEdge_Value *Out) {
      uint32_t wasm_z_stream_ptr = (uint32_t)WasmEdge_ValueGetI32(In[0]);
      int32_t wasm_level = WasmEdge_ValueGetI32(In[1]);
      uint32_t wasm_version_ptr = (uint32_t)WasmEdge_ValueGetI32(In[2]);
      int32_t wasm_stream_size = WasmEdge_ValueGetI32(In[3]);

      ValidateWasmZStream(CallFrameCxt, wasm_z_stream_ptr, wasm_version_ptr,
                          wasm_stream_size);
      auto stream = GetInitHostZStream(CallFrameCxt, wasm_z_stream_ptr);

      const auto z_res =
          deflateInit_(stream, wasm_level, ZLIB_VERSION, sizeof(z_stream));

      Out[0] = WasmEdge_ValueGenI32(z_res);

      reinterpret_cast<Util *>(Data)->stream_map.insert(
          {wasm_z_stream_ptr, stream});

      return WasmEdge_Result_Success;
    }
  • The important thing to notice here is the ValidateWasmZStream and GetInitHostZStream functions.
    ValidateWasmZStream:
      void ValidateWasmZStream(const WasmEdge_CallingFrameContext *CallFrameCxt,
                              uint32_t _wasm_z_stream_ptr,
                              uint32_t _wasm_version_ptr,
                              int32_t _wasm_stream_size) {
        WasmEdge_MemoryInstanceContext *MemCxt =
            WasmEdge_CallingFrameGetMemoryInstance(CallFrameCxt, 0);
        wasm_z_stream *wasm_stream =
            reinterpret_cast<wasm_z_stream *>(WasmEdge_MemoryInstanceGetPointer(
                MemCxt, _wasm_z_stream_ptr, sizeof(wasm_z_stream)));
        const char *wasm_ZLIB_VERSION = reinterpret_cast<const char *>(
            WasmEdge_MemoryInstanceGetPointer(MemCxt, _wasm_version_ptr, 1));

        // Check major version of zlib and assert sizeof z_stream == 56

        if (wasm_ZLIB_VERSION[0] != ZLIB_VERSION[0])
          throw std::runtime_error(std::string("Host(") + wasm_ZLIB_VERSION[0] +
                                  ") and Wasm Modue(" + ZLIB_VERSION[0] +
                                  ") zlib Major Version does not match!");

        if (_wasm_stream_size != 56)
          throw std::runtime_error(std::string("WASM sizeof(z_stream) != 56 but ") +
                                  std::to_string(_wasm_stream_size));
      }
  • I perform two validation here, asset that the wasm module zlib is 56 bytes in size and that the major versions of the zlib match, i.e. the wasm's included zlib header version and host zlib version.
  • GetInitHostZStream simply create a z_stream and return it, later it might assume other roles too like setting certain settings from wasm to host z_stream.
  • WasmEdge_inflateInit_ is similar.
  • The actual deflate routine is as follows:
      template <auto &Func>
      WasmEdge_Result WasmEdge_algo1(void *Data,
                                    const WasmEdge_CallingFrameContext *CallFrameCxt,
                                    const WasmEdge_Value *In, WasmEdge_Value *Out) {
        uint32_t wasm_z_stream_ptr = (uint32_t)WasmEdge_ValueGetI32(In[0]);
        int32_t wasm_flush = WasmEdge_ValueGetI32(In[1]);

        auto stream_map_it =
            reinterpret_cast<Util *>(Data)->stream_map.find(wasm_z_stream_ptr);

        if (stream_map_it == reinterpret_cast<Util *>(Out)->stream_map.end())
          throw std::runtime_error("ZStream not found in map");

        auto stream = stream_map_it->second;

        WasmEdge_MemoryInstanceContext *MemCxt =
            WasmEdge_CallingFrameGetMemoryInstance(CallFrameCxt, 0);
        uint8_t *wasm_mem =
            WasmEdge_MemoryInstanceGetPointer(MemCxt, 0, 128 * 1024 * 1024);

        wasm_z_stream *wasm_stream =
            reinterpret_cast<wasm_z_stream *>(wasm_mem + wasm_z_stream_ptr);

        stream->avail_in = wasm_stream->avail_in;
        stream->avail_out = wasm_stream->avail_out;
        stream->next_in = wasm_mem + wasm_stream->next_in;
        stream->next_out = wasm_mem + wasm_stream->next_out;

        const auto z_res = Func(stream, wasm_flush);

        // now write it to wasm memory
        wasm_stream->avail_in = stream->avail_in;
        wasm_stream->avail_out = stream->avail_out;
        wasm_stream->next_in = stream->next_in - wasm_mem;
        wasm_stream->next_out = stream->next_out - wasm_mem;

        Out[0] = WasmEdge_ValueGenI32(z_res);

        return WasmEdge_Result_Success;
      }
  • Here I find the search the lookup table for the host z_stream object.
  • After that I get access to the whole wasm linear memory.
            WasmEdge_MemoryInstanceGetPointer(MemCxt, 0, 128 * 1024 * 1024);
  • This might not be what you prefer, and might want to have a more finer grained memory access, but it simplifies a lot of things and helps reduce function calls. We can discuss about this later and talk about what might need to change if any after this lands New WASM memory shrink.
  • I perform certain simple pointer arithmetics and sync the host and wasm z_stream structs before and after calls to zlib library functions.
  • The deflateEnd and inflateEnd functions are pretty standard, so skipping explanation.
      template <auto &Func>
      WasmEdge_Result
      WasmEdge_ZlibEnd(void *Data, const WasmEdge_CallingFrameContext *CallFrameCxt,
                      const WasmEdge_Value *In, WasmEdge_Value *Out) {
        uint32_t wasm_z_stream_ptr = (uint32_t)WasmEdge_ValueGetI32(In[0]);

        WasmEdge_MemoryInstanceContext *MemCxt =
            WasmEdge_CallingFrameGetMemoryInstance(CallFrameCxt, 0);
        wasm_z_stream *wasm_stream =
            reinterpret_cast<wasm_z_stream *>(WasmEdge_MemoryInstanceGetPointer(
                MemCxt, wasm_z_stream_ptr, sizeof(wasm_z_stream)));

        auto stream_map_it =
            reinterpret_cast<Util *>(Data)->stream_map.find(wasm_z_stream_ptr);

        if (stream_map_it == reinterpret_cast<Util *>(Data)->stream_map.end())
          throw std::runtime_error("ZStream not found in map");

        const auto z_res = Func(stream_map_it->second);
        Out[0] = WasmEdge_ValueGenI32(z_res);

        reinterpret_cast<Util *>(Data)->stream_map.erase(stream_map_it);

        return WasmEdge_Result_Success;
      }
  • I tested the test case and host implementation and the program ran sucessfully, there verifying that this strategy works and can be implemented in the plugin. I am open to any other suggestions or implementation approaches.
  • Run command
    fathomless@vividecstasy:~/repo/wasmedge-zlib/src$ em++ module.cpp -O2 -o module.wasm -sSTANDALONE_WASM -sWARN_ON_UNDEFINED_SYMBOLS=0 -sIMPORTED_MEMORY -sINITIAL_MEMORY=128MB -sALLOW_MEMORY_GROWTH=0 && g++ -O2 host.cpp -o host -lz -lwasmedge && ./host
    Test Result : Success

Any inputs and further guidance will be much appreciated.
Thank you all.

@NeelDigonto
Copy link
Contributor

NeelDigonto commented Apr 3, 2023

@hydai @juntao If my approach is right, I will start cleaning the codebase and start with implementing the plugin and create a PR.

I am a 3rd year student, so I am planning to participate in GSOC 2023, tomorrow 4th April is the last date of proposal submission.

I had been looking into this issue/feature request for well over a week or two. I faced quite a lot of obstacles some of which I mentioned and some of which I left out and will add as constructive criticism to improve the docs and help new newcomers.

I was focused on first creating a running Proof Of Concept before trying to talk or initiate a conversation.
Had to free myself of work then understand the issue and related codebase, which delayed my response, hope you don't mind it.

It would be very kind of you, if you can review it. I am also writing the GSOC proposal draft, which will contain a more detailed overview of myself, the expected project timeline and code explaination.

Thank you.
Have a great day.

@NeelDigonto NeelDigonto mentioned this issue Apr 4, 2023
9 tasks
@hydai
Copy link
Member

hydai commented May 8, 2023

Since the GSoC 2023 declined this proposal, we will move it to LFX mentorship.

@hydai hydai added LFX Mentorship Tasks for LFX Mentorship participants and removed GSoC Tasks for Google Summer of Code participants labels May 8, 2023
@NeelDigonto
Copy link
Contributor

Okay sure. I will apply on the LFX site with an improved version of the proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-Plugin An issue related to WasmEdge Plugin feature help wanted Extra attention is needed LFX Mentorship Tasks for LFX Mentorship participants
Projects
None yet
Development

No branches or pull requests

6 participants