A Cross-Platform C# OCR Translation Overlay Part 3

Posted by Programming Is Moe on Tuesday, October 27, 2020

TOC

The final missing feature before an alpha could be released is an application GUI. Haven’t ever used AvaloniaUI before but it will be easy. Right? RIGHT?

What I need the GUI to do is the following:

  • Control overlay visibility
  • Activate the Capture + Translate main functionality
  • Set the capture region
  • Edit the Configuration Object

Getting started with Avalonia

Or not.

Several weeks later, I just can’t put up with avalonia anymore. MVVM is pretty annoying to use by itself but the lack of view conditionals is a deal breaker. The first thing I tried to solve was creating a GUI for the settings. As all settings objects I use are POCOs anyway I figured I could try to generate the settings view based on the objects via reflection. With WPF I’m 100% sure it’d be a breeze as it supports conditionals in the XAML. But without that I’d have to resort to create a model and view per settings object “type” I want to display.

The longer I think about it the one thing that annoys me the most is… the xml. As a web dev working daily with html it shouldn’t bother me as much as it does. But seeing stuff like this:

<Window xmlns="https://github.com/avaloniaui"
        xmlns:views="clr-namespace:Aris.Moe.Ocr.Overlay.Translate.Gui.Views"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:vm="clr-namespace:Aris.Moe.Ocr.Overlay.Translate.Gui.ViewModels;assembly=Aris.Moe.Ocr.Overlay.Translate.Gui"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        mc:Ignorable="d" d:DesignWidth="800" d:DesignHeight="450"
        x:Class="Aris.Moe.Ocr.Overlay.Translate.Gui.Views.MainWindow"
        Icon="/Assets/avalonia-logo.ico"
        Title="Aris.Moe.Ocr.Overlay.Translate.Gui"
        Content="{Binding Settings}">
    
</Window>
<DataTemplate DataType="{x:Type local:BooleanSettingsPropertyViewModel}">
    <CheckBox Margin="4"
              IsChecked="False" 
              Content="{Binding Name}"/> 
</DataTemplate>

Makes me want to jump out of a window. mc:, d:, xmlns:vm="clr-namespace:Aris.Moe.Ocr.Overlay.Translate.Gui.ViewModels;assembly=Aris.Moe.Ocr.Overlay.Translate.Gui". All these nonsensical shorthands. Why. Also DataType="{x:Type local:BooleanSettingsPropertyViewModel}" things that are so far removed from what XML usually is just confuses the heck out of me. Probably anyone that has even slight experience with XAML might feel more comfortable but this is just killing my motivation for this side-project.

Is that maybe how python programmers feel when looking at C#?

Let’s give QT a try

https://github.com/qmlnet/qmlnet to the rescue. Please be less verbose…

import QtQuick 2.9
import QtQuick.Layouts 1.3
import QtQuick.Controls 2.3
import QtQuick.Controls.Material 2.1

ApplicationWindow {
    id: window
    visible: true
    width: 640
    height: 480
    title: qsTr("Hello World")
    
    menuBar: MenuBar {
        Menu {
            title: qsTr("File")
            MenuItem {
                text: qsTr("&Open")
                onTriggered: console.log("Open action triggered");
            }
            MenuItem {
                text: qsTr("Exit")
                onTriggered: Qt.quit();
            }
            MenuItem {
                text: qsTr("Settings")
                onTriggered: {
                    stackView.push("Pages/Settings.qml")
                }
            }
        }
    }
}

AND IT IS. Why did I ever doubt my lord and saviour Qt? Taking a look at any Qt file they instantly feel way more readable to me. Motivation restored.

Wiring up a simple control interface

What I already have is an interface that can control all the high level methods for interacting with the application. So I only need buttons that trigger these methods

public interface IOcrTranslateOverlay : IDisposable
{
    Task TranslateScreen();
    void HideOverlay();
    void ToggleOverlay();
    void ShowOverlay();
    Task OcrScreen();
}

Creating a QT “page” to interact with that is pretty simple:

Create a Qt page with qml

import QtQuick 2.6
import QtQuick.Controls 2.1
import Aris.Moe.Ocr.Overlay.Translate.Gui 1.1

ScrollablePage {
    Grid {
        columns: 6
        spacing: 12
        width: parent.width
    
        Button {
            text: 'Translate'
            onClicked: {
                var task = model.translateScreen()
                Net.await(task, function() {
                })
            }
        }
        
        Button {
            text: 'Hide Overlay'
            onClicked: {
                model.hideOverlay()
            }
        }
        
        Button {
            text: 'Show Overlay'
            onClicked: {
                model.showOverlay()
            }
        }
        ControlsModel /*Especially take note this declaration*/ {
            id: model
        }
    }
}

Define a C# equivalent of the ControlsModel declared in the page for qmlnet to utilize

public class ControlsModel : IOcrTranslateOverlay
{
    private readonly IOcrTranslateOverlay _translateOverlay;
    
    public ControlsModel()
    {
        _translateOverlay = Program.Services.GetInstance<IOcrTranslateOverlay>();
    }

    public async Task TranslateScreen()
    {
        await _translateOverlay.TranslateScreen();
    }

    public void HideOverlay()
    {
        _translateOverlay.HideOverlay();
    }

    public void ToggleOverlay()
    {
        _translateOverlay.ToggleOverlay();
    }

    public void ShowOverlay()
    {
        _translateOverlay.ShowOverlay();
    }

    public async Task OcrScreen()
    {
        await _translateOverlay.OcrScreen();
    }

    public void Dispose()
    {
    }
}

And do not forget to register the model as the “url” you used in the import section of the page

Qml.Net.Qml.RegisterType<ControlsModel>("Aris.Moe.Ocr.Overlay.Translate.Gui", 1, 1);

I’m not sure if abusing a static service locator is the correct way to go but for the foreseeable future this should not pose a problem

Interactive capture area resize

What I want is that once the user clicks on “Set Screen Capture region”, the ImGui overlay should pop up, show the current capture region and make it possible to drag along the the screen to define a new rectangle and report that back to the main application for further handling. Or abort by pressing escape.

Currently the overlay only supports displaying text. And the whole resize flow would inflate the render code by quite a bit. So I decided to split the text rendering and resize rendering into distinct “modes”

protected override void Render()
{
    switch (_currentMode)
    {
        case OverlayMode.TextOverlay:
            _textOverlay.Render();
            break;
        case OverlayMode.ResizeTargetOverlay:
            _resizeOverlay.Render();
            break;
        default:
            throw new ArgumentOutOfRangeException();
    }
}
interface IGuiMode
{
    void Render();
}

Make un-clickable transparent window clickable again

For the drag mode to work I need to restore the capability of the overlay to be clickable again. Or else any drag would wreck havoc on the poor application below the overlay. Should be simple enough with the

SetOverlayClickable(IntPtr handle, bool wantClickable)

I borrowed in Part1. And it isn’t simple

Issue: the overlay “ignores” the first click

When the overlay gets set to be clickable again the first click on the overlay doesn’t get registered. It doesn’t matter from what state this occurs from. If the overlay was hidden before or currently being displayed. One click is needed to gain focus before other clicks can be processed.

What I’m currently doing to show restore the click-ability is:

SetWindowLongPtr(handle, GWL_EXSTYLE, GWL_EXSTYLE_CLICKABLE); //user32.dll
SetFocus(handle); //user32.dll

But SetFocus really never manages to take the focus away from anything. After a huge amount of trial and error the following combination works:

SetWindowLongPtr(handle, GWL_EXSTYLE, GWL_EXSTYLE_CLICKABLE); //user32.dll
SetForegroundWindow(handle); //user32.dll
ShowWindow(handle, SW_SHOW); //user32.dll

And that last ShowWindow has nothing to do with whether the overlay was hidden before by ShowWindow(handle, SW_HIDE) or not. Without it, the focus does not switch to the overlay. Ever. I have no idea if this is a quirk of GWL_EXSTYLE_NOT_CLICKABLE

Issue: the overlay looses focus and get minimized during the first run

Also while debugging the other Issue. I noticed that when the overlay is shown for the first time, sometimes the overlay ““loses”” focus and doesn’t work as an overlay at all, and will just minimize if you click under it. When that happens, tabbing once to the overlay fixes that permanently. (Also tabbing to the overlay or not had no bearing on the prior issue)

Once I had fixed the overlay-ignores-first-click issue with SetForegroundWindow I tried putting that method also in the code that was responsible for the initial call to the SetOverlayClickable(handle, false). And to my surprise, that also fixed that. Kinda. Downside is that the overlay initialization would cause the main user GUI to lose focus and start minimized. Which sucks majorly and no acceptable so I removed the SetForegroundWindow again.

After a bunch more of trial and error I figured out that the broken initial overlay was caused by my multi monitor setup.

I had the control application on Screen 1 while the overlay was set for Screen 2. When my mouse was on the non-overlay-screen, the overlay would bug out and have the issue. But if my mouse was resting on the overlay screen the overlay would work as expected. And I mean, just the mouse position. No clicking or focusing another window.

So I decided to ignore this issue because I intended for the overlay to span all screens later on anyway, which hopefully will fix that problem. Just didn’t implement a screen size detection feature yet.

Interactive capture area resize for real

Yeah well, the actual work for getting that feature working was not really interesting. Just the bugs were imo. So here is what my code does

  • Switch Overlay to interactive resize mode
  • Display the current capture area
  • Wait for mouse down and remember mouse position as StartPoint
  • Keep track of the current mouse position and draw a rectangle from StartPoint to currentMousePosition
  • Wait for mouse release event and remember mouse position as EndPoint
  • Calculate the rectangle from the two points
  • Return that result
public static Rectangle? ToRectangleWithUnknownPointOrder(this Point a, Point b)
{
    var pointsAreOnTheSameLine = a.X == b.X || a.Y == b.Y;
    
    if (pointsAreOnTheSameLine)
        return null;
    
    var aHasLowestX = a.X < b.X;
    var aHasLowestY = a.Y < b.Y;
    
    var lowestX = aHasLowestX ? a.X : b.X;
    var lowestY = aHasLowestY ? a.Y : b.Y;
    
    var highestX = aHasLowestX ? b.X : a.X;
    var highestY = aHasLowestY ? b.Y : a.Y;

    var topLeft = new Point(lowestX, lowestY);
    var bottomRight = new Point(highestX, highestY);

    return topLeft.ToRectangle(bottomRight);
}

private static Rectangle ToRectangle(this Point topLeft, Point bottomRight)
{
    return new Rectangle(topLeft, new Size(bottomRight.X - topLeft.X, bottomRight.Y - topLeft.Y));
}

&ldquo;showcase of the resize feature&rdquo;
showcase of the resize feature

Next next task will be to actually get that Settings editor up, polish the UI a tiny bit, get CI setup and we are ready for a first Windows alpha release.