79566632

Date: 2025-04-10 12:26:43
Score: 0.5
Natty:
Report link

According to code comments for MapCompose

The functions can also return `None` in which case the output of that function is ignored for further processing over the chain.

    def __call__(, value: Any, loader_context: MutableMapping[str, Any] | None = None) -> Iterable[Any]:
        if loader_context:
            context = ChainMap(loader_context, self.default_loader_context)
        else:
            context = self.default_loader_context

Although according to the code, if I interpret it correctly, MapCompose ignores functions if None is an input instead pushing default_loader_context down the chain. This makes my code conceptually wrong as the functions that address None are meaningless because they are not executed by MapCompose.


@furas transformed the question to default values.

According to the changelog, support for default field values was removed in v0.14 that is before 2012-10-18. However, an introduction of @dataclass returned this concept in v2.2.0. Documentation states that attr.s items also allow to define the type and default value of each defined field, and, similarly to @dataclass, also do not provide an example. Additionally, get() method has a default argument.

get() method is easy and it replaces None with "get_method_default"

    start_urls = ["https://books.toscrape.com"]
    def parse(self, response):
        title=response.xpath("//h3/a/text()").get(),
        none_get=response.xpath("//h3/b/text()").get(default="get_method_default")

@dataclass is questionable in my implementation because it returns "dataclass_field_default" only if none_field is deliberately switched off otherwise it returns None

    @dataclass
    class NoneItem:
        title: str
        none_get: str
        none_field: str = "dataclass_field_default"
    def parse(self, response):
        title=response.xpath("//h3/a/text()").get(),
        none_get=response.xpath("//h3/b/text()").get(default="get_method_default")
        none_field=response.xpath("//h3/b/text()").get()
        
        item = NoneItem(
            title=title,
            none_get=none_get,
            # none_field=none_field
        )
        yield item 

@attr.s() item is similarly defined and shows the same behavior.


In summary as for now, get() is a suitable Scrapy method to replace occasional None with default values.

Reasons:
  • Long answer (-1):
  • Has code block (-0.5):
  • User mentioned (1): @furas
  • Self-answer (0.5):
  • Low reputation (0.5):
Posted by: Dmitry Borisoglebsky