-
Notifications
You must be signed in to change notification settings - Fork 233
fix: protobuf (de)ser for docvec #1639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
# handle values that were None before serialization | ||
tensor_columns[tens_col_name] = None | ||
else: | ||
# TODO(johannes): handle torch, tf, numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will do this in a separate PR, it might require a proto change (but not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it require a proto change. I think here you should look at the tensor type of this field in the doc type and use it to load the proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same way you do it for doc_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss that in a separate PR
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
📝 Docs are deployed on https://ft-fix-docvec-proto--jina-docs.netlify.app 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, added on small comment
closes #1561
The problem here was that deser from proto to docvec did not acutally deserialize the columns. Instead, it stored raw protos as columns instead of deserialized values.
A special case is when a column is
None
, and this PR introduces a convention of how those columns are represented in the proto. This avoid having to change the proto definition.Limitation: Right now this only works for np columns, not tf or torch. Support for those two will come in a separate PR.
Breaking Change: The .proto definition is changed in this PR. I added
ListOfDocVecProto
to properly represent a DocVec's doc_vec_columns, and adjustedDocVecProto
accordingly. This will break backwards compatibility for DocVecs serialiazed using older versions of docarray. But those older version were broken in that regard anyways, so it should not be an issue.TODO:
None
handling for