TOC
A few days ago I had the idea for a “OCR translation overlay” that was supposed to capture and overlay the translation of any text on the screen. Primarily for text that is embedded in an image or application where copy pasting is not possible or a chore
Existing Solutions
Naturally being lazy I searched for already existing solutions that could solve my problem.
translation-overlay
- Only 2 specific games
- Code is older than 5 years. I want something fresh
universal game translator
Actually exactly what I wanted. But:
It’s hacked to only work on Windows (due to the low level nature of writing something that can do screen captures) but in theory those pieces could be abstracted out to be more platform agnostic.
HA. Move along c++ peasant. I, Amiron, can do better than that! That’s what you get for not using the superior cross platform capable .netcore™
I-I just want an excuse for a small hobby project (´。_。`)
Rolling My Own Solution
The plan was simple: Capture the screen > OCR > Translate > Render to screen. EZ PZ, 2 days at most if I just substitute the actual hard work with existing nuget libraries.
I had already decided on using Google Cloud for OCR and Translation, being familiar with it through other projects. The only thing I didn’t know was how to render freestyle graphics onto the screen.
How 2 Text 2 Screen?
Well how do you render anything? DirextX? OpenGl? Using them directly seems like a lot of work. Also I don’t want to educate myself, so we need to go more abstract and high level. Looking around google one of the first things I found for it was Veldrid
Veldrid is a cross-platform, graphics API-agnostic rendering and compute library for .NET. It provides a powerful, unified interface to a system’s GPU and includes more advanced features than any other .NET library. Unlike other platform- or vendor-specific technologies, Veldrid can be used to create high-performance 3D applications that are truly portable.
Supported backends:
Direct3D 11
Vulkan
Metal
OpenGL 3
OpenGL ES 3
Neato! My cross platform dreams are coming to life. Time to read the getting started and how to render text…
Hmm, Create the window and graphics device, ok seems simple. Yes, go on
Creating graphics resourced, Create the shaders, VertElements, Buffers. Uhhh, that’s a lot of things I don’t understand and don’t want to learn. I hope it’s just boiler plate that is used once…
And the result of that is what?
Oh all that is for a colored rectangle. Wait. So if I want to render ANYTHING I actually need to understand how to load textures, declare the correct vertexes to get shapes out? How do I handle Fonts? Do I need a texture for every character? how do I get the right spacing? Haha my poor webdev brain says: Is there an npm package for that? Welp, time to find an alternative to Veldrid that is more high level
Frantic googling for alternatives
Dear ImGui is a bloat-free graphical user interface library for C++. It outputs optimized vertex buffers that you can render anytime in your 3D-pipeline enabled application. It is fast, portable, renderer agnostic and self-contained (no external dependencies).
Examples pls.
That looks amazing! Time to misuse a whole Gui library just to render text. (Insert webdev website bloat joke). Good thing someone wrote a .net wrapper (ImGui.Net) for it. Well bye bye Veldrid!
Cool, it even has an example program.
static void Main(string[] args)
{
// Create window, GraphicsDevice, and all resources necessary for the demo.
VeldridStartup.CreateWindowAndGraphicsDevice(
new WindowCreateInfo(50, 50, 1280, 720, WindowState.Normal, "ImGui.NET Sample Program"),
new GraphicsDeviceOptions(true, null, true),
out _window,
out _gd);
_window.Resized += () =>
...
Oh. Made by the same guy as Veldrid. And If had bothered to scroll down on the Veldrid tutorial I’d would have known about ImGui.Net earlier.
Anyway, ImGui is an immeadiate mode paradigm Gui that loops over render instructions each frame to display graphics. Akin to the way Unity handles “Gizmos”. Using ImGui to render a text box with a transparent background was really easy once you get used to how ImGui handles “style” instructions
foreach (var activeText in _activeTexts)
{
ImGui.SetNextWindowSize(new Vector2(activeText.Area.Width, activeText.Area.Height));
ImGui.SetNextWindowPos(new Vector2(activeText.Area.X, activeText.Area.Y));
ImGui.PushStyleVar(ImGuiStyleVar.Alpha, 0.8f);
var name = activeText.GetHashCode().ToString();
ImGui.Begin(name, ImGuiWindowFlags.NoDecoration | ImGuiWindowFlags.NoMove);
ImGui.PopStyleVar();
ImGui.PushStyleVar(ImGuiStyleVar.Alpha, 1.0f);
ImGui.BeginChild(name + "text");
ImGui.Text(activeText.Text);
ImGui.EndChild();
ImGui.PopStyleVar();
ImGui.End();
}
Getting Opacity To Work
So far so good, now how to get the actual program background to be transparent? Setting the bg color to full alpha does nothing… But the Veldrid window has a Opacity property. Let’s turn that down
That’s kinda what I expected to happen. Still I found several blog posts about people turning their OpenGl applications bg to transparent and that worked for them. But no matter what backend I force in Veldrid the window background transparency doesn’t change.
7 billion people on this planet somebody else must have solved this already with Veldrid. And someone did: zaafar/ClickableTransparentOverlay
1:1 the same kind of libraries I’m using, sweet. Even has published this as a nuget package. But multiple things I need are still missing Let’s copy paste it and I will get back to this later and make a pull request once I’m done prototyping.
So how did they set the window to be transparent?
NativeMethods.InitTransparency(window.Handle);
NativeMethods.SetOverlayClickable(window.Handle, false);
And also a solution to how to make the overlay click trough. Nice.
“NativeMethods” … Oh no.
internal static void InitTransparency(IntPtr handle)
{
GWL_EXSTYLE_CLICKABLE = GetWindowLongPtr(handle, GWL_EXSTYLE);
GWL_EXSTYLE_NOT_CLICKABLE = new IntPtr(
GWL_EXSTYLE_CLICKABLE.ToInt64() | WS_EX_LAYERED | WS_EX_TRANSPARENT);
Margins margins = Margins.FromRectangle(new Rectangle(-1, -1, -1, -1));
DwmExtendFrameIntoClientArea(handle, ref margins);
}
bye bye cross platform dreams. Or at least easy cross platform dreams. Somebody somewhere must have a cross platform solution for that… And then I remembered that Godot is a thing. The Godot game engine seems to have an implementation for window transparency on Mac, Linux and windows. Of course platform dependant but solved for all of them
Why didn’t I just make it with godot in the first place? Because only just now did I remember, that cross platform high level rendering solutions is just another way to describe cross platform game engines. At least now I have a goal for a V2 implementation
How 2 OCR Translate Screen Thingies?
Ez Pz capture the screen with
var bmpScreenshot = new Bitmap(area.Width, area.Height, PixelFormat.Format32bppArgb);
using var g = Graphics.FromImage(bmpScreenshot);
g.CopyFromScreen(area.Location, new Point(0, 0), area.Size, CopyPixelOperation.SourceCopy);
But oh, hide the overlay first by setting the window state to hidden. And then show the overlay again after we got the translation.
And nothing happens.
No text on screen. The overlay didn’t come back with the debug boxes and after a quick look into the logs: the OCR returned no text. Ok one problem at a time. Why isn’t the overlay coming back?
Turns out that by using windows native functions to set GWL_EXSTYLE -> WS_EX_LAYERED | WS_EX_TRANSPARENT, SDL is not able to change the window back to normal once hidden. If you don’t set it then SDL can toggle it freely. Why ? I don’t know but I’m too lazy to figure it out. So I thought: alright let’s cheat by skipping the render of the ImGui Instructions
if (Visible) {
Render();
}
Brilliant. Aaaand, Google still reports: 0 Text. Let’s output the image I capture to a file and see what gets sent to the OCR service:
Nice.
“C# CopyFromScreen black image”
https://stackoverflow.com/questions/875563/graphics-copyfromscreen-returns-black-screen
Windows screenshot functions do not copy data from DirectX surfaces. You have to do that manually like described in this article.
Really ? There were many other Stack Overflow posts with people having the similar problems but never an answer that solved it. What I also tried was setting the opacity to 0% for the whole window but that yielded the same result. And if that SO poster is correct then that’s to be expected.
… DirectX capture issues. Yes that reminded me of puush.me, a screen capture tool for windows that sometimes had issues with fullscreen games, also just producing a black picture. For a few years now I have switched to using ShareX and I know for a fact that I have NEVER seen that behaviour with ShareX. So how does a ShareX capture of my overlay look like? (Spoiler: all prior screenshots were made with ShareX)
Good thing ShareX is written in C# and accomplishes what I fail to do: Time to copy paste pirate some code. Simply by searching for CopyFromScreen
in the repo I found a class called Screenshot.cs which looked like it was responsible for capturing… screenshots. The method found by that was
private Bitmap CaptureRectangleManaged(Rectangle rect)
{
if (rect.Width == 0 || rect.Height == 0)
{
return null;
}
Bitmap bmp = new Bitmap(rect.Width, rect.Height, PixelFormat.Format24bppRgb);
using (Graphics g = Graphics.FromImage(bmp))
{
// Managed can't use SourceCopy | CaptureBlt because of .NET bug
g.CopyFromScreen(rect.Location, Point.Empty, rect.Size, CopyPixelOperation.SourceCopy);
}
return bmp;
}
Almost exactly what I’ve been doing, except for that CopyPixelOperation parameter. Which I also tried to no avail.
// Managed can’t use SourceCopy | CaptureBlt because of .NET bug
I tried searching around for any mentions of a bug concerning CopyFromScreen
or CopyPixelOperation
and the only thing I could guess at was that for some reason unknown to me the CopyPixelOperation is not declared as [Flag]
enum so doing bit flag operations with it is not possible.
So what does this CaptureBlt they wanted to use do?
Includes any windows that are layered on top of your window in the resulting image. By default, the image only contains your window. Note that this generally cannot be used for printing device contexts.
Kinda sounds like this is what I want. What I also noticed is that ShareX doesn’t use this Bitmap CaptureRectangleManaged(Rectangle rect)
anywhere. But right above it is another method that does get used as far as I can tell without actually debugging it
private Bitmap CaptureRectangleNative(IntPtr handle, Rectangle rect, bool captureCursor = false)
{
if (rect.Width == 0 || rect.Height == 0)
{
return null;
}
IntPtr hdcSrc = NativeMethods.GetWindowDC(handle);
IntPtr hdcDest = NativeMethods.CreateCompatibleDC(hdcSrc);
IntPtr hBitmap = NativeMethods.CreateCompatibleBitmap(hdcSrc, rect.Width, rect.Height);
IntPtr hOld = NativeMethods.SelectObject(hdcDest, hBitmap);
NativeMethods.BitBlt(hdcDest, 0, 0, rect.Width, rect.Height, hdcSrc, rect.X, rect.Y, CopyPixelOperation.SourceCopy | CopyPixelOperation.CaptureBlt);
...
NativeMethods.SelectObject(hdcDest, hOld);
NativeMethods.DeleteDC(hdcDest);
NativeMethods.ReleaseDC(handle, hdcSrc);
Bitmap bmp = Image.FromHbitmap(hBitmap);
NativeMethods.DeleteObject(hBitmap);
return bmp;
}
Ayy, even more native methods. Sorry c++ guy for ever making fun of you. So, much shameful code duplication later I gave it another go and this is the result:
Yep. Nothing has changed.
Does my imported windows dll get substituted to something else because I’m using .netcore and ShareX is .Net4.7? I just can’t imagine that being the case but instead of checking
So I got curious how the .net native screengrab solution would fare against other applications. I started the game Control in DirectX12… and the screenshot worked. Also Risk of Rain 2 (a unity game) was also screengrabbed without issue. So it’s not like that it being a gpu rendered image is the issue but just my overlay?
Or am I the issue?
This is when I realised that for the game screen grabbing I manually entered the area of capture as just (x: 0, y: 0, w: 1000, h: 1000). Which worked. What did my application try to grab? (x: 2475, y: 105, w: 1000, h: 1000). Which is what I hard coded because I wanted to grab the image from my right screen (And both my screens are 1920/1080). And I thought that the screenCopy function would need an X > 1920 to target the right screen. Not it didn’t.
Assumptions will be the death of me.