79151785

Date: 2024-11-02 23:47:37
Score: 0.5
Natty:
Report link

I have a similar filter that appears to work -- there's a pretty frustrating hiccup, though, as I'll try to explain below.

My method was to supply names of the classes that I want to keep, as a comma separated list, in a metadata argument to the pandoc cli. Inside the filter I parsed that list into a table using LPeg; I then applied a filter each to Blocks and Inlines elements, using that table to test inclusion.

Given this filter (classfilter.lua on the LUA_PATH):

-- split arglist by lpeg, function as it appears on lpeg documentation:
-- https://www.inf.puc-rio.br/~roberto/lpeg/
local function split(s, sep)
    sep = lpeg.P(sep)
    local elem = lpeg.C((1 - sep) ^ 0)
    local p = lpeg.Ct(elem * (sep * elem) ^ 0) -- make a table capture
    return lpeg.match(p, s)
end

local keeplist = {}
-- This function will go inside the Meta filter
-- the keeplist table will be available to the rest
-- of the filters to consult
local function collect_vars(m)
    for _, classname in pairs(split(m.keeplist, ",")) do
        keeplist[classname] = true
    end
end

local function keep_elem(_elem)
    -- keep if no class designation
    if not _elem.classes or #_elem.classes == 0 then
        return true
    end
    -- keep if class name in keeplist
    for _, classname in ipairs(_elem.classes) do
        if keeplist[classname] ~= nil then
            return true
        end
    end
    -- don't keep otherwise

    return false
end

local function filter_list_by_classname(_elems)
    for _elemidx, _elem in ipairs(_elems) do
        if not keep_elem(_elem) then
            _elems:remove(_elemidx)
        end
    end
    return _elems
end

-- forcing the meta filter to run first
return { { Meta = collect_vars }, { Inlines = filter_list_by_classname, Blocks = filter_list_by_classname } }

-- and this pandoc cli command: pandoc -f markdown -t markdown --lua-filter=classfilter.lua <YOUR EXAMPLE INPUT> -M 'keeplist=other,bw-only'

-- I get the following output:

## Images

![Generic image](generic_image.png)

![Generic image](generic_image.png){width="30px"}

<figure>

<figcaption>Color only image</figcaption>
</figure>

<figure>

<figcaption>Color only image</figcaption>
</figure>

<figure>

<figcaption>Color only image</figcaption>
</figure>

![BW only image](bw_image.png){.bw-only}

![BW only image](bw_image.png){.bw-only width="30px"}

![BW only image](bw_image.png){.bw-only width="30px"}

## Blocks

::: other
Block that shouldn't be filtered.
:::

::: Block that shouldn't be filtered. :::

::: bw-only
BW only block.
:::

## Spans

[Span that shouldn't be filtered]{.other}

[BW only span]{.bw-only}

## Links

[Link that shouldn't be filtered](link.html)

[Link that shouldn't be filtered](link.html){.other}

[BW only link](link.html){.bw-only}

... which appears to be the desired output, except the <figure> tags, which I haven't been able to remove. Assuming that my Inline filter removes only the src element from a Figure, and keeps the caption intact, I tried iterating over Figure and Image elements in a separate filter, to locate Figure blocks with empty contents fields, to replace them with an empty table -- but that didn't alter the result at all. I mean, adding the following to the filter_list_by_classname function before the ipairs loop:

_elems:walk({
    Figure = function(a_figure)
        if not a_figure.content[1].content[1] then
            return {}
        else
            return a_figure
        end
    end
})

did nothing.

So maybe this could be the start of a solution.

Reasons:
  • RegEx Blacklisted phrase (1): I want
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: Walther Stolzing