Skip to content

ModuleNotFoundError running dashboard Tutorial1 step-by-step #993

Closed
@gregn123

Description

@gregn123

I built the PostgresML extension and dashboard from your Github source (v2.7.4 tag) and deployed and configured it on my RHEL8.8 Linux VM environment.

I found that when running the Tutorial 1, step-by-step, using the Dashboard, step 42 failed with the following error:

error returned from database: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'sklearn'"), traceback: Some(<traceback object at 0x7fe997260ec0>) }

Caused by:
    called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'sklearn'"), traceback: Some(<traceback object at 0x7fe997260ec0>) }

From analysis of the Rust source code (https://github.com/postgresml/postgresml/blob/v2.7.4/pgml-extension/src/bindings/sklearn.rs) it seems that some pgml.* functions do not first activate the Python virtual environment, and this seems to be a bug.

For example, the above error does not occur if I prefix the Step 42 query in the notebook cell with:

SELECT pgml.validate_python_dependencies();

(which I know DOES activate the python virtual environment).

Without adding the above workaround, I found that another workaround to this issue was to also install the following in the global python environment:

scikit-learn==1.3.0
xgboost==1.7.6
lightgbm==4.0.0
catboost==1.2

A colleague has since run Tutorial 1 step-by-step using v2.7.8 and reproduced the issue, and I have also looked at the v2.7.8 code, and the issue still seems to be present.
I know that the v2.7.8 dashboard lets you run ALL of the steps in one go, and that DOES work for Tutorial1, but I think that's because the python virtual environment activation happens on a previous step and it maintains the context for subsequent steps because it's done on the same database connection.

Can you confirm this is a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions