LIVE NEWS
  • House passes bill to provide more Ukraine aid and impose new sanctions on Russia
  • A study of 8,300 older adults revealed a surprising salt habit
  • Why 95% of enterprise GPUs sit idle while AI startups can’t get compute
  • OpenAI to comply with Trump AI model review order: Osborne
  • Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
  • Ice-sheet regime shifts with climate warming
  • George Santos threatened me after I wrote about him : NPR
  • AI PCs and HIPAA: Here’s What Healthcare Organizations Need to Know
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • See More
    • Artificial Intelligence
    • Climate Risks
    • Defense
    • Healthcare Innovation
    • Science
    • Technology
    • World
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • Artificial Intelligence
  • Climate Risks
  • Defense
  • Healthcare Innovation
  • Science
  • Technology
  • World
Home»Artificial Intelligence»Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
Artificial Intelligence

Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset

primereportsBy primereportsJune 5, 2026No Comments1 Min Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
Share
Facebook Twitter LinkedIn Pinterest Email


from sentence_transformers import util
def search(query, k=5):
   q = model.encode([query], normalize_embeddings=True)
   sims = util.cos_sim(q, emb)[0].cpu().numpy()
   idx = sims.argsort()[::-1][:k]
   print(f'\n=== Query: "{query}" ===')
   for rank, i in enumerate(idx, 1):
       row = work.iloc[i]
       print(f"\n[{rank}] sim={sims[i]:.3f} | {row['taxonomy_level_1']} "
             f"| status={row['open_status']}")
       print("   ", row[TEXT_COL][:260].replace("\n", " "), "...")
search("rational points on hyperelliptic curves")
search("multiplicativity of maximal output p-norm of a quantum channel")
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
y = work["open_status"].values
Xtr, Xte, ytr, yte = train_test_split(
   emb, y, test_size=0.25, random_state=RANDOM_STATE, stratify=y)
clf = LogisticRegression(max_iter=2000, class_weight="balanced", C=2.0)
clf.fit(Xtr, ytr)
pred = clf.predict(Xte)
print("\n=== open_status classifier (embeddings + logistic regression) ===")
print(classification_report(yte, pred))
fig, ax = plt.subplots(figsize=(7, 6))
ConfusionMatrixDisplay.from_predictions(
   yte, pred, ax=ax, cmap="Blues", xticks_rotation=45,
   normalize="true", values_format=".2f")
ax.set_title("open_status confusion matrix (row-normalized)")
plt.tight_layout(); plt.show()
sims = util.cos_sim(emb, emb).cpu().numpy()
np.fill_diagonal(sims, 0)
i, j = np.unravel_index(sims.argmax(), sims.shape)
print(f"\nMost similar pair (cos={sims[i, j]:.3f}):")
for n in (i, j):
   print(f"\n  paper_id={work.iloc[n]['paper_id']} | "
         f"{work.iloc[n]['taxonomy_level_1']}")
   print("   ", work.iloc[n][TEXT_COL][:240].replace("\n", " "), "...")
print("\nDone. Set SAMPLE_SIZE=None at the top to run on the full 14.1k rows.")

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIce-sheet regime shifts with climate warming
Next Article OpenAI to comply with Trump AI model review order: Osborne
primereports
  • Website

Related Posts

Artificial Intelligence

Will Solana Continue Breaking Records After Making Histroy?

June 5, 2026
Artificial Intelligence

Microsoft makes Linux developers feel more at home in Windows with Coreutils release

June 5, 2026
Artificial Intelligence

HPE Catches Its First GenAI Wave With Enterprises, Sovereigns, And Neoclouds

June 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Paxton’s win over Cornyn sets up high-stakes Texas clash with Talarico

May 28, 202616 Views

Global Resources Outlook 2024 | UNEP

December 6, 202510 Views

Texas Democrat Talarico claims voting laws are rigged ahead of Paxton race

May 28, 20269 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from FooBar about tech, design and biz.

PrimeReports.org
Independent global news, analysis & insights.

PrimeReports.org brings you in-depth coverage of geopolitics, markets, technology and risk – with context that helps you understand what really matters.

Editorially independent · Opinions are those of the authors and not investment advice.
Facebook X (Twitter) LinkedIn YouTube
Key Sections
  • World
  • Geopolitics
  • Popular Now
  • Artificial Intelligence
  • Cybersecurity
  • Crypto
All Categories
  • Artificial Intelligence
  • Climate Risks
  • Crypto
  • Cybersecurity
  • Defense
  • Economy
  • Geopolitics
  • Global Markets
  • Healthcare Innovation
  • Politics
  • Popular Now
  • Science
  • Technology
  • World
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms & Conditions
  • Disclaimer
  • Cookie Policy
  • DMCA / Copyright Notice
  • Editorial Policy

Sign up for Prime Reports Briefing – essential stories and analysis in your inbox.

By subscribing you agree to our Privacy Policy. You can opt out anytime.
Latest Stories
  • House passes bill to provide more Ukraine aid and impose new sanctions on Russia
  • A study of 8,300 older adults revealed a surprising salt habit
  • Why 95% of enterprise GPUs sit idle while AI startups can’t get compute
© 2026 PrimeReports.org. All rights reserved.
Privacy Terms Contact

Type above and press Enter to search. Press Esc to cancel.