This post is the forth part of a series on creating an AI for the game Path of Exile © (PoE).
- A Deep Learning Based AI for Path of Exile: A Series
- Calibrating a Projection Matrix for Path of Exile
- PoE AI Part 3: Movement and Navigation
- PoE AI Part 4: Real-Time Screen Capture and Plumbing
- AI Plays Path of Exile Part 5: Real-Time Obstacle and Enemy Detection using CNNs in TensorFlow
As discussed in the first post of this series, the AI program takes a screenshot of the game and uses it to form predictions that are then used to update its internal state. In this post, efficient methods for capturing images of the game screen are explored.
Figure 1: Flowchart of AI Logic
import time from PIL import ImageGrab def GetScreenshot1(): im = ImageGrab.grab() if __name__ == "__main__": while True: t1 = time.time() GetScreenshot1() t2 = time.time() print('Elapsed: ' + str(t2 - t1))
C:\Users\Public\Python>python SSTime.py Elapsed: 0.1562650203704834 Elapsed: 0.15630722045898438 Elapsed: 0.1562347412109375 Elapsed: 0.14066553115844727 Elapsed: 0.17187738418579102 Elapsed: 0.1718764305114746 Elapsed: 0.15629959106445312
Unfortunately, the performance seen above leaves a bit to be desired. Without any other processing, the program could only hope to process a maximum of around 5 to 6 frames per second (FPS). Further, in the above code, the screen capture is happening on the main thread. Thus, the entire program waits for the screen capture during which no processing or interaction with the game can occur. Another issue is that only the game screen should be captured (in windowed mode) and not the rest of the desktop.
Using Windows API
Several Windows API calls can alleviate these issues and improve performance.
import numpy as np import win32gui import win32ui, win32con from threading import Thread, Lock import time #Asynchronously captures screens of a window. Provides functions for accessing #the captured screen. class ScreenViewer: def __init__(self): self.mut = Lock() self.hwnd = None self.its = None #Time stamp of last image self.i0 = None #i0 is the latest image; self.i1 = None #i1 is used as a temporary variable self.cl = False #Continue looping flag #Left, Top, Right, and bottom of the screen window self.l, self.t, self.r, self.b = 0, 0, 0, 0 #Border on left and top to remove self.bl, self.bt, self.br, self.bb = 12, 31, 12, 20 #Gets handle of window to view #wname: Title of window to find #Return: True on success; False on failure def GetHWND(self, wname): self.hwnd = win32gui.FindWindow(None, wname) if self.hwnd == 0: self.hwnd = None return False self.l, self.t, self.r, self.b = win32gui.GetWindowRect(self.hwnd) return True #Get's the latest image of the window def GetScreen(self): while self.i0 is None: #Screen hasn't been captured yet pass self.mut.acquire() s = self.i0 self.mut.release() return s #Get's the latest image of the window along with timestamp def GetScreenWithTime(self): while self.i0 is None: #Screen hasn't been captured yet pass self.mut.acquire() s = self.i0 t = self.its self.mut.release() return s, t #Gets the screen of the window referenced by self.hwnd def GetScreenImg(self): if self.hwnd is None: raise Exception("HWND is none. HWND not called or invalid window name provided.") self.l, self.t, self.r, self.b = win32gui.GetWindowRect(self.hwnd) #Remove border around window (8 pixels on each side) #Remove 4 extra pixels from left and right 16 + 8 = 24 w = self.r - self.l - self.br - self.bl #Remove border on top and bottom (31 on top 8 on bottom) #Remove 12 extra pixels from bottom 39 + 12 = 51 h = self.b - self.t - self.bt - self.bb wDC = win32gui.GetWindowDC(self.hwnd) dcObj = win32ui.CreateDCFromHandle(wDC) cDC = dcObj.CreateCompatibleDC() dataBitMap = win32ui.CreateBitmap() dataBitMap.CreateCompatibleBitmap(dcObj, w, h) cDC.SelectObject(dataBitMap) #First 2 tuples are top-left and bottom-right of destination #Third tuple is the start position in source cDC.BitBlt((0,0), (w, h), dcObj, (self.bl, self.bt), win32con.SRCCOPY) bmInfo = dataBitMap.GetInfo() im = np.frombuffer(dataBitMap.GetBitmapBits(True), dtype = np.uint8) dcObj.DeleteDC() cDC.DeleteDC() win32gui.ReleaseDC(self.hwnd, wDC) win32gui.DeleteObject(dataBitMap.GetHandle()) #Bitmap has 4 channels like: BGRA. Discard Alpha and flip order to RGB #For 800x600 images: #Remove 12 pixels from bottom + border #Remove 4 pixels from left and right + border return im.reshape(bmInfo['bmHeight'], bmInfo['bmWidth'], 4)[:, :, -2::-1]
In the GetHWND function above, win32gui.FindWindow(None, wname) is used to get the handle of the game window. In this case, wname should be “Path of Exile” or win32gui.FindWindow(None, “Path of Exile”).
With a handle to the game window, win32gui.GetWindowRect(self.hwnd) gives the position of the game window on the screen. These values are necessary for translating mouse movements from within the game window (size 800×600) to an absolute value on the screen (usually something like 1920×1080).
The GetScreenImg function above is the code that actually captures an image of the game screen and stores it in a numpy matrix. There are 3 main things to note about the above code. First, windows on the screen have a border that is not useful for the AI and can be discarded. The variables self.bl, self.br, self.bt, and self.bb store the border for the left, right, top, and bottom of the window respectively. Second, some pixels are discarded from the edges of the image so that the height and width of the image are multiples of 7 and 9 respectively. The reason for this is covered in the next post in this series. Third, bitmap data from the Windows API is organized as groups of 4 8-bit integers like BGRA for the blue, green, red, and alpha channels respectively. Most python imaging libraries expect 3 channels like RGB. The final line in GetScreenImg reverses the order of the channels and discards the alpha channel, which is not used here.
Since the game is repeatedly capturing images of the screen, it makes sense to perform the capture in a separate thread and to provide an interface for other threads to read images in an asynchronous and thread-safe manner. This way an image of the screen is always available nearly instantaneously. This can be accomplished using Thread and Lock objects from the threading library.
#Begins recording images of the screen def Start(self): #if self.hwnd is None: # return False self.cl = True thrd = Thread(target = self.ScreenUpdateT) thrd.start() return True #Stop the async thread that is capturing images def Stop(self): self.cl = False #Thread used to capture images of screen def ScreenUpdateT(self): #Keep updating screen until terminating while self.cl: self.i1 = self.GetScreenImg() self.mut.acquire() self.i0 = self.i1 #Update the latest image in a thread safe way self.its = time.time() self.mut.release()
In order to time the new code, measurements should be taken in the ScreenUpdateT function. A quick and dirty approach with the final timings follows.
#Thread used to capture images of screen def ScreenUpdateT(self): #Keep updating screen until terminating while self.cl: t1 = time.time() self.i1 = self.GetScreenImg() print('Elapsed: ' + str(time.time() - t1)) self.mut.acquire() self.i0 = self.i1 #Update the latest image in a thread safe way self.its = time.time() self.mut.release()
import time from ScreenViewer import ScreenViewer if __name__ == "__main__": sv = ScreenViewer() sv.GetHWND('CodeLibrary') sv.Start() time.sleep(1) sv.Stop() C:\Users\Public\Python>python SSTime.py Elapsed: 0.015668153762817383 Elapsed: 0.015635251998901367 Elapsed: 0.015609025955200195 Elapsed: 0.015636205673217773 Elapsed: 0.015625 Elapsed: 0.015612602233886719
The timings are roughly an order of magnitude faster. Now, the AI can process a theoretical maximum of around 64 FPS. The main AI program access images of the screen using a data member of type ScreenViewer similar to the code that follows.
class Bot: def __init__(self, name): self.sv = ScreenViewer() #For getting screens of the game #... def Run(): #... self.sv.Start() #... while True: #... I = self.sv.GetScreen() ProcessScreen(I) #... #... self.sv.Stop() return
The next post in this series will cover using convolutional neural networks (CNN) to process the images of the screen to update the state of the AI.