According to code comments for MapCompose
The functions can also return `None` in which case the output of that function is ignored for further processing over the chain.
def __call__(, value: Any, loader_context: MutableMapping[str, Any] | None = None) -> Iterable[Any]:
if loader_context:
context = ChainMap(loader_context, self.default_loader_context)
else:
context = self.default_loader_context
Although according to the code, if I interpret it correctly, MapCompose ignores functions if None is an input instead pushing default_loader_context down the chain. This makes my code conceptually wrong as the functions that address None are meaningless because they are not executed by MapCompose.
@furas transformed the question to default values.
According to the changelog, support for default field values was removed in v0.14 that is before 2012-10-18. However, an introduction of @dataclass returned this concept in v2.2.0. Documentation states that attr.s items also allow to define the type and default value of each defined field, and, similarly to @dataclass, also do not provide an example. Additionally, get() method has a default argument.
get() method is easy and it replaces None with "get_method_default"
start_urls = ["https://books.toscrape.com"]
def parse(self, response):
title=response.xpath("//h3/a/text()").get(),
none_get=response.xpath("//h3/b/text()").get(default="get_method_default")
@dataclass is questionable in my implementation because it returns "dataclass_field_default" only if none_field is deliberately switched off otherwise it returns None
@dataclass
class NoneItem:
title: str
none_get: str
none_field: str = "dataclass_field_default"
def parse(self, response):
title=response.xpath("//h3/a/text()").get(),
none_get=response.xpath("//h3/b/text()").get(default="get_method_default")
none_field=response.xpath("//h3/b/text()").get()
item = NoneItem(
title=title,
none_get=none_get,
# none_field=none_field
)
yield item
@attr.s() item is similarly defined and shows the same behavior.
In summary as for now, get() is a suitable Scrapy method to replace occasional None with default values.