-
Notifications
You must be signed in to change notification settings - Fork 50
feat: Add pagination buttons (prev/next) to anywidget mode for DataFrames #1841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
8125614
to
0f871a9
Compare
87958ac
to
3f87072
Compare
36f89e2
to
6f70a4a
Compare
e01a330
to
0586866
Compare
c4642e6
to
98eba30
Compare
from bigframes import display | ||
|
||
# Store the widget for _repr_mimebundle_ to use | ||
self._anywidget_instance = display.TableWidget(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider using https://docs.python.org/3/library/weakref.html#module-weakref to avoid a circular reference here.
It's not technically a problem because Python can clean up circular references with the garbage collector (see https://uwpce-pythoncert.github.io/SystemDevelopment/weak_references.html#the-limits-of-reference-counting), it will just do so more slowly. That said, it is a best practice to avoid explicit circular references such as what is being done here. (self -> TableWidget and TableWidget -> self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we want to re-use _anywidget_instance
if _repr_html_
gets called more than once?
if TYPE_CHECKING: | ||
import anywidget | ||
import traitlets | ||
else: | ||
try: | ||
import anywidget | ||
import traitlets | ||
except Exception: | ||
ANYWIDGET_INSTALLED = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just do the try/except here to avoid too much nesting. Other than WIDGET_BASE, I'm not seeing any references to anywidget
or traitlets
in the types.
if TYPE_CHECKING: | |
import anywidget | |
import traitlets | |
else: | |
try: | |
import anywidget | |
import traitlets | |
except Exception: | |
ANYWIDGET_INSTALLED = False | |
try: | |
import anywidget | |
import traitlets | |
except Exception: | |
ANYWIDGET_INSTALLED = False |
dataframe: The Bigframes Dataframe to display in the widget. | ||
""" | ||
if not ANYWIDGET_INSTALLED: | ||
raise ImportError("Anywidget is not installed, cannot create TableWidget.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please direct the user what to do in terms they would understand. I recommend the following:
raise ImportError("Anywidget is not installed, cannot create TableWidget.") | |
raise ImportError("Please `pip install anywidget` to use TableWidget.") |
|
||
# Initialize data fetching attributes. | ||
self._batches = dataframe.to_pandas_batches(page_size=self.page_size) | ||
self._cached_data = pd.DataFrame(columns=self._dataframe.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating a DataFrame can be kinda slow (lots of potential for unnecessary memory copies). Perhaps we can use a list of DataFrame instead?
Or are you trying to support dynamic changing of the page size here? If so, perhaps leave a comment explaining that.
# Initialize data fetching attributes. | ||
self._batches = dataframe.to_pandas_batches(page_size=self.page_size) | ||
self._cached_data = pd.DataFrame(columns=self._dataframe.columns) | ||
self._table_id = str(uuid.uuid4()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: A comment for what this _table_id
attribute purpose is would be helpful.
# Store the widget for _repr_mimebundle_ to use | ||
self._anywidget_instance = display.TableWidget(self) | ||
# Return a fallback HTML string | ||
return "Interactive table widget (anywidget mode)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be calling _anywidget_instance._repr_html_
here? Or display on the widget instance? See jupyter-widgets/ipywidgets#3843 as well as my comment in the notebook about the warning.
warnings.warn( | ||
"Anywidget mode is not available, falling back to deferred mode." | ||
) | ||
return formatter.repr_query_job(self._compute_dry_run()) | ||
|
||
self._cached() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never make it to this code or the blob_display
code or the column width code that follows if they set anywidget as the mode. Are you sure that's what we want? At some point (maybe BigFrames 3.0 or maybe even earlier) we'll want to make anywidget
the default, so it would be disappointing to lose these features.
Maybe we need some TODOs and follow-up issues filed to consolidate these code paths?
self._all_data_loaded = True | ||
# update row count if we loaded all data | ||
if self.row_count == 0: | ||
self.row_count = len(self._cached_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused what the purpose of this is. Didn't we set row_count
already in the constructor?
Also, 0 is not a great placeholder for missing value, since 0 is a valid number of rows in a query result. If we do delay populating row_count
, let's use None
as the placeholder for missing value.
except Exception as e: | ||
raise RuntimeError(f"Error during batch processing: {str(e)}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the purpose of this is? If it's not needed, it seems like it might be better just to let the underlying exception raise.
except Exception as e: | |
raise RuntimeError(f"Error during batch processing: {str(e)}") from e |
except Exception as e: | ||
raise RuntimeError(f"Error during batch processing: {str(e)}") from e | ||
|
||
def _get_batch_iterator(self) -> Iterator[pd.DataFrame]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should use an @property
for this.
Adding interactive pagination buttons controls (Previous/Next buttons) to the anywidget mode for BigFrames DataFrames, enabling users to navigate through large datasets page by page in Jupyter notebooks.
b/391599608