drawing

Adapt your smartlink's auras to underlying pdf content¶

Last update : 06/28/2022

Basic information and examples to create advanced behaviours based on pdf text content.

Watch smartlink example in action here

Argoflow provides lots of helpers to parse, interpret pdf texts and manipulate auras.


*Asynchronicity helpers: APIs to help you manage asynchronuous reading*	*Simple manipulators: APIs to subscribe to text parsing and control auras*

NB : you can also use batch mode to modify auras based with configuration files. But these below technics can be useful if you don't know every information ahead of the generation.

Asynchronous loading¶

We must always keep in mind that pdf pages are always loaded asynchronously!

So we need some technics to force loading or call callbacks when page is actually loaded.

Taking business decisions based on TextFacts¶

What if you want to condition auras' content based on what is inside the pdf ?

In our example (see the invoice folder), we need to get infos on an electricity contract.

Creating commmon TextFacts¶

An extracting infos task result could be used by several business modules.

For instance the electricity consumption could be used to condition advertisements to be displayed and to trigger a warning message on excessive consumption compared to others customers in the neighborood.

That's why TextFacts are created ahead of the business rules and you need to subscribe to their fullfillement.

A TextFact is an object composed of :

a name. Consider it a unique identifier.
a method, potentially complex, to parse texts and return a result object
a list of pages where this method should be applied. For instance, contract infos could be on page 1 or 2, depending on the customer. So we should precise that this TextFact should be applied on page 1 or 2.

NB : Whenever a page is rendered, argoflow will try and assess the TextFacts that may be infered from it. If it is not assessed yet (a TextFact is never recalculated since pdf text content is fixed), then it is assessed.

The method should take the following parameters as input :

textToStudy : string. In common case, consider it will be the string content of a given pdf page, this page being amongst the above list of pages.
setValue : function. The function that will be called once the TextFact has been estimated.

Parsing method example¶

You can find an example in invoice/textParsing

	Creating a parsing method can be a complex issue. You can start by getting a page content and work on it. The API that helps is PDFZ.api.getPageText. `let pageText=""; PDFZ.api.getPageText(1).then((p=>{pageText=p;console.log("ready")})`

Registering a TextFact¶

TextFacts must be registered so that they can used by other components and be calculated only once.

PDFZ.api.rules.registerTextFact(rule name : string, rule method: function, pages : number[])

Subscribing to TextFacts and take actions¶

If you registered TextFacts, you can now subscribe to their fullfilment and take needed actions.

You subscribe to TextFacts like this :

PDFZ.api.rules.registerRule(TextFact name : string, (ruleValue:object, hasError:boolean) => {
    //do what needs to be done
    //ruleValue is the value passed by the TextFact method
})

The callback is called as soon as the TextFact has been assessed. In standard case, just after the page on which the TextFact is based has been rendered.

NB : what if we need on page 1 to assess a TextFact likely on page 9 ? As of today, we only take into account the open pages to calculate facts. To force the TextFacts assessment on page 9, use PDFZ.api.rules.executeOnPage(9)

You can see a working example in invoice/popinInfoContract

Dynamic positioning¶

We name "dynamic positioning" the technics used to place an aura according to text positions in the pdf.

This is a possibly complex task since finding (without possible confusion) the desired place can be.

We provide apis to get position of text(s) :

PDFZ_api.getTextPosition(pageNumber, text)
PDFZ_api.getSplitTextPosition(pageNumber, text)
PDFZ.api.text.getContinuousOrSplitTextPosition (pageNumber, text) : combination of 2 above
PDFZ.api.text.getRegexPosition(pageNumber: number, regex: string, flags: string = "")
And some more like PDFZ_api.getTwoWordsPosition(pageNumber, text)

Then it's up to you to place your auras according to the returned rectangles in the page.

A flexible example is provided in invoice/dynamicPositioning

Some more info on PDFZ.api.text APIs¶

    /**
     *  Get the position of a text fragment on the page, as aura coordinates.
     *  Warning:
     *  Uses the text layer frompdf.js, so the results are equivalent to a manual select/copy and may not be perfect.
     */
getTextPosition(pageNumber: number, searchString: string): Promise<PDFZ_api.PageRectangle[]>

    /**
     *  Expects a 2-word string.
     *  Uses assumptions which are probably valid in latin characters with common fonts, in left-to-right languages.
     */
getSplitTextPosition(pageNumber: number, text: string): Promise<PDFZ_api.PageRectangle[]>

    /**
     * Sometime two words look to be on the same line but are in different blocks in the pdf file.
     * This method try to find some text based on the visual position.
     */
getContinuousOrSplitTextPosition(pageNumber: number, text: string): Promise<PDFZ_api.PageRectangle[]>

    /**
     * Execute a regex on the text of the given page and returns the bouding box of the results.
     * @param pageNumber Page number where to search
     * @param regex Regular expression used for the search
     * @param flags Optional flags for the regular expression
     */
getRegexPosition(pageNumber: number, regex: string, flags: string = ""): Promise<PDFZ_api.PageRectangle[]>

Some useful APIs to manage auras via script¶

    /**
     *  Get the coordinates of an asset.
     *  The asset is identified by name. If more than one asset has the same name, only the first one is used.
     *  If none is found, undefined is returned.
     */
getAssetPosition(name:string): AssetCoordinates

    /**
     *  Move an augmentation to the given position expressed as aura coordinates (dimensions are ignored).
     *  The asset is identified by name. If more than one asset has the same name, only the first one is used.
     *  Returns a promise which resolves when the action is done or stopped.
     */
assetMoveTo(name:string, coordinates:AssetCoordinates, duration:number = 0): Promise<void>

    /**
     *  Show/hide a hidden asset.
     *  The asset is identified by name. If more than one asset has the same name, only the first one is used.
     *  Returns a promise which resolves when the action is done or stopped.
     * set duration if you want fade in/out animation
     */
assetShow(name:string, duration:number = 0)
assetHide(name:string, duration:number = 0)

To sum up, what happens when a page is rendered¶

When a new page is rendered,

An event is sent to whatever is listening.

NB : PDFZ.api.on("pagerendered", (page) => { console.log("page rendered "+page)}); In particular, Argoflow retrieves TextFacts based on this page and check if they are already assessed.

If a TextFact was not yet checked and can be assessed on the newly rendered page, then it turns as assessed and every rules listening to this TextFact are notified
Notified Rules apply the callback function

What if... text is not exposed in the pdf ?¶

In case the pdf is just an image, we can integrate an OCR tool. Contact us for more information.