Skip to content

feat: Add pagination buttons (prev/next) to anywidget mode for DataFrames #1841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

shuoweil
Copy link
Contributor

Adding interactive pagination buttons controls (Previous/Next buttons) to the anywidget mode for BigFrames DataFrames, enabling users to navigate through large datasets page by page in Jupyter notebooks.

b/391599608

@shuoweil shuoweil requested a review from tswast June 23, 2025 23:31
@shuoweil shuoweil self-assigned this Jun 23, 2025
@shuoweil shuoweil requested review from a team as code owners June 23, 2025 23:31
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 23, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-button branch 2 times, most recently from 8125614 to 0f871a9 Compare June 25, 2025 00:28
@shuoweil shuoweil requested a review from tswast June 25, 2025 03:03
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 25, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 25, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-button branch 2 times, most recently from 87958ac to 3f87072 Compare June 26, 2025 00:04
@shuoweil shuoweil requested a review from tswast June 26, 2025 01:12
@shuoweil shuoweil force-pushed the shuowei-anywidget-button branch 2 times, most recently from 36f89e2 to 6f70a4a Compare June 26, 2025 04:51
@shuoweil shuoweil force-pushed the shuowei-anywidget-button branch 4 times, most recently from e01a330 to 0586866 Compare June 27, 2025 02:54
@shuoweil shuoweil requested a review from tswast June 27, 2025 03:10
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 27, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 27, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-button branch from c4642e6 to 98eba30 Compare June 27, 2025 22:17
@shuoweil shuoweil requested a review from tswast June 27, 2025 22:18
from bigframes import display

# Store the widget for _repr_mimebundle_ to use
self._anywidget_instance = display.TableWidget(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider using https://docs.python.org/3/library/weakref.html#module-weakref to avoid a circular reference here.

It's not technically a problem because Python can clean up circular references with the garbage collector (see https://uwpce-pythoncert.github.io/SystemDevelopment/weak_references.html#the-limits-of-reference-counting), it will just do so more slowly. That said, it is a best practice to avoid explicit circular references such as what is being done here. (self -> TableWidget and TableWidget -> self)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we want to re-use _anywidget_instance if _repr_html_ gets called more than once?

Comment on lines +28 to +36
if TYPE_CHECKING:
import anywidget
import traitlets
else:
try:
import anywidget
import traitlets
except Exception:
ANYWIDGET_INSTALLED = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do the try/except here to avoid too much nesting. Other than WIDGET_BASE, I'm not seeing any references to anywidget or traitlets in the types.

Suggested change
if TYPE_CHECKING:
import anywidget
import traitlets
else:
try:
import anywidget
import traitlets
except Exception:
ANYWIDGET_INSTALLED = False
try:
import anywidget
import traitlets
except Exception:
ANYWIDGET_INSTALLED = False

dataframe: The Bigframes Dataframe to display in the widget.
"""
if not ANYWIDGET_INSTALLED:
raise ImportError("Anywidget is not installed, cannot create TableWidget.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please direct the user what to do in terms they would understand. I recommend the following:

Suggested change
raise ImportError("Anywidget is not installed, cannot create TableWidget.")
raise ImportError("Please `pip install anywidget` to use TableWidget.")


# Initialize data fetching attributes.
self._batches = dataframe.to_pandas_batches(page_size=self.page_size)
self._cached_data = pd.DataFrame(columns=self._dataframe.columns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating a DataFrame can be kinda slow (lots of potential for unnecessary memory copies). Perhaps we can use a list of DataFrame instead?

Or are you trying to support dynamic changing of the page size here? If so, perhaps leave a comment explaining that.

# Initialize data fetching attributes.
self._batches = dataframe.to_pandas_batches(page_size=self.page_size)
self._cached_data = pd.DataFrame(columns=self._dataframe.columns)
self._table_id = str(uuid.uuid4())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A comment for what this _table_id attribute purpose is would be helpful.

# Store the widget for _repr_mimebundle_ to use
self._anywidget_instance = display.TableWidget(self)
# Return a fallback HTML string
return "Interactive table widget (anywidget mode)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be calling _anywidget_instance._repr_html_ here? Or display on the widget instance? See jupyter-widgets/ipywidgets#3843 as well as my comment in the notebook about the warning.

warnings.warn(
"Anywidget mode is not available, falling back to deferred mode."
)
return formatter.repr_query_job(self._compute_dry_run())

self._cached()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never make it to this code or the blob_display code or the column width code that follows if they set anywidget as the mode. Are you sure that's what we want? At some point (maybe BigFrames 3.0 or maybe even earlier) we'll want to make anywidget the default, so it would be disappointing to lose these features.

Maybe we need some TODOs and follow-up issues filed to consolidate these code paths?

Comment on lines +126 to +129
self._all_data_loaded = True
# update row count if we loaded all data
if self.row_count == 0:
self.row_count = len(self._cached_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused what the purpose of this is. Didn't we set row_count already in the constructor?

Also, 0 is not a great placeholder for missing value, since 0 is a valid number of rows in a query result. If we do delay populating row_count, let's use None as the placeholder for missing value.

Comment on lines +131 to +132
except Exception as e:
raise RuntimeError(f"Error during batch processing: {str(e)}") from e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the purpose of this is? If it's not needed, it seems like it might be better just to let the underlying exception raise.

Suggested change
except Exception as e:
raise RuntimeError(f"Error during batch processing: {str(e)}") from e

except Exception as e:
raise RuntimeError(f"Error during batch processing: {str(e)}") from e

def _get_batch_iterator(self) -> Iterator[pd.DataFrame]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should use an @property for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants