Background
Metadata provides important details to the Parser and helps it interpret the data in each table correctly. You can also combine metadata using '|' as a delimiter between metadata. For example, for "USD 1 1080.52", one can use number|currency to parse it out as "11080.52 USD".
There must first be a distinction made below a value, a header and a title. A value would be referring to a data in a cell (same as the excel cell) that we want to extract for analysis purposes, while a header merely directs us to where the value is. A title on the other hand, is (usually) a string of text above the headers. Some metadatas apply the same concept to a value and header separately. Hence this distinction must be very clear for the config to do what we want it to.
We will also be referencing the sub- group very often, this term refers to the subfooter, subheader and subtitle metadata together as a whole.
Some Examples of Metadata Usage
Subheader | Description | Examples |
---|---|---|
backfill | Opposite of forwardfill | |
bold | Used in conjunction with the sub- group of metadata. Especially useful when the subheader/title/footer are distinguishable from the rest of the tabular data due to it being bold. | bold |
bottom | For when the statement level data you are looking to extract is below the reference point/text | |
case_sensitive | Table metadata which can be added to any table where we want the headers to be matched the old way, with capitalizations and all. | |
computer_vision | Used with 'key' to indicate that row bounds should be determined by things like row separators in the table. This is useful for when there is no header group or set of header groups that are guaranteed to be in every row. The parser uses deterministic headers to determine when one row begins and the other ends, but if this isn't possible because all headers are optional, then using computer_vision will tell the parser to look for visual cues for the beginning and end of rows, such as line separators or background color differences between rows. |