LIVE NEWS
  • The lamentable state of British defence acquisition
  • One of the sky’s rarest phenomena is back — How to see rare night-shining clouds this summer
  • A maritime drone explodes at a Romanian Black Sea port, no one hurt
  • A stablecoin tied to Strategy stock depegs putting a new DeFi dollar risk in focus as Bitcoin sells off
  • Rust-Written IronWorm Hits NPM Supply Chain
  • Panini stickers, a World Cup tradition, sees biggest demand yet in the U.S. : NPR
  • As Global Demand for Gold Grows, UN Mercury Head Warns Toxic Fumes Put Women in a Motherhood Dilemma — Global Issues
  • XAU/USD languishes below $4,480 with US Nonfarn Payrolls on tap
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • See More
    • Artificial Intelligence
    • Climate Risks
    • Defense
    • Healthcare Innovation
    • Science
    • Technology
    • World
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • Artificial Intelligence
  • Climate Risks
  • Defense
  • Healthcare Innovation
  • Science
  • Technology
  • World
Home»Artificial Intelligence»Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
Artificial Intelligence

Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset

primereportsBy primereportsJune 5, 2026No Comments1 Min Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
Share
Facebook Twitter LinkedIn Pinterest Email


from sentence_transformers import util
def search(query, k=5):
   q = model.encode([query], normalize_embeddings=True)
   sims = util.cos_sim(q, emb)[0].cpu().numpy()
   idx = sims.argsort()[::-1][:k]
   print(f'\n=== Query: "{query}" ===')
   for rank, i in enumerate(idx, 1):
       row = work.iloc[i]
       print(f"\n[{rank}] sim={sims[i]:.3f} | {row['taxonomy_level_1']} "
             f"| status={row['open_status']}")
       print("   ", row[TEXT_COL][:260].replace("\n", " "), "...")
search("rational points on hyperelliptic curves")
search("multiplicativity of maximal output p-norm of a quantum channel")
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
y = work["open_status"].values
Xtr, Xte, ytr, yte = train_test_split(
   emb, y, test_size=0.25, random_state=RANDOM_STATE, stratify=y)
clf = LogisticRegression(max_iter=2000, class_weight="balanced", C=2.0)
clf.fit(Xtr, ytr)
pred = clf.predict(Xte)
print("\n=== open_status classifier (embeddings + logistic regression) ===")
print(classification_report(yte, pred))
fig, ax = plt.subplots(figsize=(7, 6))
ConfusionMatrixDisplay.from_predictions(
   yte, pred, ax=ax, cmap="Blues", xticks_rotation=45,
   normalize="true", values_format=".2f")
ax.set_title("open_status confusion matrix (row-normalized)")
plt.tight_layout(); plt.show()
sims = util.cos_sim(emb, emb).cpu().numpy()
np.fill_diagonal(sims, 0)
i, j = np.unravel_index(sims.argmax(), sims.shape)
print(f"\nMost similar pair (cos={sims[i, j]:.3f}):")
for n in (i, j):
   print(f"\n  paper_id={work.iloc[n]['paper_id']} | "
         f"{work.iloc[n]['taxonomy_level_1']}")
   print("   ", work.iloc[n][TEXT_COL][:240].replace("\n", " "), "...")
print("\nDone. Set SAMPLE_SIZE=None at the top to run on the full 14.1k rows.")

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIce-sheet regime shifts with climate warming
Next Article OpenAI to comply with Trump AI model review order: Osborne
primereports
  • Website

Related Posts

Artificial Intelligence

Will Solana Continue Breaking Records After Making Histroy?

June 5, 2026
Artificial Intelligence

Microsoft makes Linux developers feel more at home in Windows with Coreutils release

June 5, 2026
Artificial Intelligence

HPE Catches Its First GenAI Wave With Enterprises, Sovereigns, And Neoclouds

June 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Paxton’s win over Cornyn sets up high-stakes Texas clash with Talarico

May 28, 202616 Views

Global Resources Outlook 2024 | UNEP

December 6, 202510 Views

Texas Democrat Talarico claims voting laws are rigged ahead of Paxton race

May 28, 20269 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from FooBar about tech, design and biz.

PrimeReports.org
Independent global news, analysis & insights.

PrimeReports.org brings you in-depth coverage of geopolitics, markets, technology and risk – with context that helps you understand what really matters.

Editorially independent · Opinions are those of the authors and not investment advice.
Facebook X (Twitter) LinkedIn YouTube
Key Sections
  • World
  • Geopolitics
  • Popular Now
  • Artificial Intelligence
  • Cybersecurity
  • Crypto
All Categories
  • Artificial Intelligence
  • Climate Risks
  • Crypto
  • Cybersecurity
  • Defense
  • Economy
  • Geopolitics
  • Global Markets
  • Healthcare Innovation
  • Politics
  • Popular Now
  • Science
  • Technology
  • World
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms & Conditions
  • Disclaimer
  • Cookie Policy
  • DMCA / Copyright Notice
  • Editorial Policy

Sign up for Prime Reports Briefing – essential stories and analysis in your inbox.

By subscribing you agree to our Privacy Policy. You can opt out anytime.
Latest Stories
  • The lamentable state of British defence acquisition
  • One of the sky’s rarest phenomena is back — How to see rare night-shining clouds this summer
  • A maritime drone explodes at a Romanian Black Sea port, no one hurt
© 2026 PrimeReports.org. All rights reserved.
Privacy Terms Contact

Type above and press Enter to search. Press Esc to cancel.