Skip to content

Silas add ranking #1498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 5, 2024
Merged

Silas add ranking #1498

merged 3 commits into from
Jun 5, 2024

Conversation

SilasMarvin
Copy link
Contributor

@SilasMarvin SilasMarvin commented Jun 3, 2024

This integration brings reranking into postgresml.

pgml=# SELECT pgml.rank('mixedbread-ai/mxbai-rerank-large-v1', 'test', array_agg(md5(random()::text)), '{"return_documents": false, "top_k": 10}') FROM generate_series(1, 100);
           rank            
---------------------------
 (58,0.20096051692962646,)
 (91,0.2007983922958374,)
 (84,0.1950932741165161,)
 (83,0.1925133764743805,)
 (7,0.1918289214372635,)
 (15,0.1851881593465805,)
 (67,0.18225009739398956,)
 (94,0.1795625537633896,)
 (40,0.17863182723522186,)
 (97,0.177006796002388,)
(10 rows)

@SilasMarvin SilasMarvin requested review from montanalow and kczimm June 3, 2024 18:07

if transformer not in __cache_sentence_transformer_by_name:
__cache_sentence_transformer_by_name[transformer] = create_cross_encoder(
transformer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass kwargs through to create_cross_encoder we can specify the device, https://www.sbert.net/docs/package_reference/cross_encoder/cross_encoder.html?highlight=crossencoder#sentence_transformers.cross_encoder.CrossEncoder

We should do this for the SentenceTransformer constructor, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to create a separate argument for this, or pop specific arguments from kwargs. If just pass kwargs straight through we will get an unexpected keyword argument error.

Copy link
Contributor

@kczimm kczimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should bump the version (probably to 2.9.0 since it's an API addition) and add the following migration file sql/pgml--2.8.5--2.9.0.sql:

-- src/api.rs:613
-- pgml::api::rank
CREATE  FUNCTION pgml."rank"(
	"transformer" TEXT, /* &str */
	"query" TEXT, /* &str */
	"documents" TEXT[], /* alloc::vec::Vec<&str> */
	"kwargs" jsonb DEFAULT '{}' /* pgrx::datum::json::JsonB */
) RETURNS TABLE (
	"corpus_id" bigint,  /* i64 */
	"score" double precision,  /* f64 */
	"text" TEXT  /* core::option::Option<alloc::string::String> */
)
IMMUTABLE STRICT PARALLEL SAFE 
LANGUAGE c /* Rust */
AS 'MODULE_PATHNAME', 'rank_wrapper';

@SilasMarvin
Copy link
Contributor Author

We should bump the version (probably to 2.9.0 since it's an API addition) and add the following migration file sql/pgml--2.8.5--2.9.0.sql:

-- src/api.rs:613
-- pgml::api::rank
CREATE  FUNCTION pgml."rank"(
	"transformer" TEXT, /* &str */
	"query" TEXT, /* &str */
	"documents" TEXT[], /* alloc::vec::Vec<&str> */
	"kwargs" jsonb DEFAULT '{}' /* pgrx::datum::json::JsonB */
) RETURNS TABLE (
	"corpus_id" bigint,  /* i64 */
	"score" double precision,  /* f64 */
	"text" TEXT  /* core::option::Option<alloc::string::String> */
)
IMMUTABLE STRICT PARALLEL SAFE 
LANGUAGE c /* Rust */
AS 'MODULE_PATHNAME', 'rank_wrapper';

Done! 745b190

@SilasMarvin SilasMarvin merged commit fb2426f into master Jun 5, 2024
1 check passed
@SilasMarvin SilasMarvin deleted the silas-add-rank branch June 5, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants