Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

prithivMLmods 
posted an update 3 days ago
view post
Post
3402
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!
MonsterMMORPG 
posted an update 1 day ago
view post
Post
1714
WAN 2.1 FusionX + Self Forcing LoRA are the New Best of Local Video Generation with Only 8 Steps + FLUX Upscaling Guide : https://www.youtube.com/watch?v=Xbn93GRQKsQ

Tutorial : https://www.youtube.com/watch?v=Xbn93GRQKsQ

Video Chapters

0:00 Introduction to the New FusionX Video Model & FLUX Upscaling
0:30 One-Click Presets & The SwarmUI Model Downloader Explained
1:07 Achieving Hyper-Realism with the FLUX 2x Latent Upscale Preset
1:58 How to Download & Install the SwarmUI Model Downloader
2:49 Downloading Full Models vs. Downloading Just The LoRAs
3:48 Final Setup: Updating SwarmUI & Importing The New Presets
4:32 Generating a Video: Applying the FusionX Image-to-Video Preset
5:03 Critical Step: Correcting The Model's Native Resolution Metadata
5:55 Finalizing Image-to-Video Settings (Frame Count & RIFE Interpolation)
6:49 Troubleshooting Performance: Identifying Low GPU Usage & Shared VRAM Bug
8:35 The Solution: Disabling Sage Attention for Image-to-Video Models
10:02 Final Result: Showcasing The Amazing HD Quality Animation
10:40 How to Use the FusionX Text-to-Video Model with Presets
11:49 Text-to-Video Result & Quality Comparison
12:08 How to Use the FusionX LoRA with the Base Wan 2.1 Model
13:07 FLUX Tutorial: Downloading The Required Upscaler & Face Models
13:48 Generating a High-Quality Image with The Official FLUX Preset
14:50 Using Automatic Face Segmentation & Inpainting with FLUX
16:05 The Ultimate Upgrade: Applying The FLUX 2x Latent Upscaler Preset
16:32 Final Result: Comparing Standard vs. 2x Upscaled Image Quality
16:50 Outro & Sneak Peek of The New Ultimate Video Processing App
  • 2 replies
·
merve 
posted an update 1 day ago
view post
Post
2662
Releases of the past week are here merve/releases-june-13-6852c3c1eaf1e0c24c958860

Here's our picks 🤓
So many interesting models released past week in open AI! 🤖

🖼️ Computer Vision/VLMs
> nanonets/Nanonets-OCR-s is the new state-of-the-art OCR model that can handle checkboxes, watermarks, tables (OS)
> Meta released facebook/v-jepa-2-6841bad8413014e185b497a6, new sota video embeddings with two new classification models (OS)
> ByteDance-Seed/SeedVR2-3B is a new 3B video restoration model (OS)

Audio
> Stepfun released stepfun-ai/Step-Audio-AQAA, new large (137B 🤯) audio language model that takes in audio and generates audio (OS)

🤖 Robotics
> nvidia released nvidia/GR00T-N1.5-3B, new open foundation vision language action model

3D
> tencent/Hunyuan3D-2.1 is the new version of Hunyuan by Tencent that can generate 3D assets from text and image prompts
openfree 
posted an update 2 days ago
view post
Post
2621
🎯 Open GAMMA - AI PPT Generator 'GamJa'

🚀 Project Introduction
Revolutionary AI presentation generator presented by OpenFree AI Community! Create professional-level PPTs with just a few clicks.
🆓 Completely FREE! Create Premium PPTs with Free GAMMA! 🎉

DEMO: openfree/Open-GAMMA

✨ Key Features

🤖 Powered by FACTS Grounding Leaderboard 2nd RANK LLM
Base Model: vidraft/gemma-3-R1984-27B
Perfect support for English/Korean/Multi-language
Automatic speaker notes generation

🎨 Premium Visuals
3D style AI image generation
5 design themes (Professional, Modern, Nature, Creative, Minimal)
FLUX style diagram images
Automatic emoji bullet points

📊 Smart Diagrams
Process Flow, Concept Map, WBS, Radial, Synoptic Chart
Content analysis-based automatic diagram generation
Perfect Korean font support

💡 Main Features

📝 Intelligent Content Generation
Auto-generate 3-20 slides just by entering a topic
Latest information through web search
Reference PDF, CSV, TXT files


🖼️ Visual Automation
3D images for cover & conclusion slides
Auto-generate 2 content-based diagrams
Add 2 FLUX style images


🎯 Customizable Design
5 professional themes
3 layout styles
Automatic emoji mapping system

💰 Premium Features for FREE!
Create professional-grade presentations with Free GAMMA (Open GAMMA) that rivals paid PPT generation services! 🚀
  • 3 replies
·
ghostai1 
posted an update 2 days ago
view post
Post
2484
# Reinforcement Learning societal impact: A Deep Dive

Artificial Intelligence, or AI, is revolutionizing the way we live, work, and interact with our environment. With advancements in Reinforcement Learning (RL), machines are becoming increasingly intelligent and capable of making decisions autonomously. This shift is having a significant impact on society as we know it.

One of the most notable aspects of RL is its ability to learn from experience. By observing and interacting with its surroundings, an AI-driven RL system can adapt to new situations and make decisions based on real-world data. This has huge implications for industries like healthcare, where AI can be used to analyze patient data and provide personalized treatment plans, or finance, where it can help predict market trends and make more informed investment decisions.

Furthermore, RL is driving innovation in robotics and automation. Autonomous vehicles, for example, rely on RL to navigate complex environments safely and efficiently. Similarly, manufacturing processes are being automated with RL-powered robots that can learn and improve their performance over time.

While these advancements bring countless benefits, they also raise concerns about privacy, security, and job displacement. It's crucial that we continue to develop ethical guidelines for AI usage and invest in reskilling programs to help workers transition into new roles as automation becomes more prevalent.

In conclusion, the societal impact of AI-driven Reinforcement Learning is vast and multifaceted. From healthcare to finance, transportation to manufacturing, RL is transforming industries and shaping our future in ways we've only just begun to comprehend. As we continue to harness the power of this technology, it's important that we also consider its implications and strive to create a world where AI enhances human potential, rather than replaces it.
multimodalart 
posted an update about 21 hours ago
view post
Post
910
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing
  • 1 reply
·
merve 
posted an update 3 days ago
view post
Post
3382
IN: video fine-tuning support for facebook V-JEPA 2 in HF transformers 🔥

it comes with
> four models fine-tuned on Diving48 and SSv2 dataset facebook/v-jepa-2-6841bad8413014e185b497a6
> FastRTC demo on V-JEPA2 SSv2 qubvel-hf/vjepa2-streaming-video-classification
> fine-tuning script on UCF-101 https://gist.github.com/ariG23498/28bccc737c11d1692f6d0ad2a0d7cddb
> fine-tuning notebook on UCF-101 https://colab.research.google.com/drive/16NWUReXTJBRhsN3umqznX4yoZt2I7VGc?usp=sharing
we're looking forward to see what you will build! 🤗
clem 
posted an update about 13 hours ago
Jaward 
posted an update 1 day ago
yjernite 
posted an update 3 days ago